El architecture update - reth (#242)

* RLP wiki page - first commit * data serialization section is added * need for RLP in Ethereum section is addecd * Fix overview image * RL encoding/decoding sections are added * RLP tools and resources sections are added. * fixed the sidebar conflicts * fixed the sidebar conflicts * fixed the sidebar conflicts * fixed the sidebar conflicts * fixed the sidebar conflicts * fixed the sidebar conflicts * RLP links to sidebar is added * typos for wordlist is fixed * typos for wordlist is fixed * Update docs/wiki/EL/RLP.md Co-authored-by: Mário Havel <[email protected]> * Update docs/wiki/EL/RLP.md Co-authored-by: Mário Havel <[email protected]> * Updated after the review * Link RLP page in architecture doc * Cross-links & move RLP heading under DataStructure * additions to the intro * Reth : added codecs, DB abstractions, cursor * .. * Typo & Phrasing improvement * Add Reth's tables section from week 7 This completes the storage section from week 7 * Typos and wordlist * Typos and wordlist * Update wordlist --------- Co-authored-by: Nagu Thogiti <[email protected]> Co-authored-by: Mário Havel <[email protected]>
eth-protocol-fellows · May 1, 2024 · 3369c66 · 3369c66
1 parent 688777a
commit 3369c66
Show file tree

Hide file tree

Showing 2 changed files with 74 additions and 21 deletions.
diff --git a/docs/wiki/EL/clients/reth.md b/docs/wiki/EL/clients/reth.md
@@ -24,8 +24,55 @@ The image represents a rough component flow of Reth's architecture:
 - **BlockchainTree**: When we are nearing the end of the chain during the syncing process, we transition to the blockchain tree. The synchronization occurs close to the tip, when state root validation and execution take place in memory.
 - **Database**: When a block gets canonicalized, it is moved to the database
 - **Provider**: An abstraction over database that provides utility functions to help us avoid directly accessing the keys and values of the underlying database.
-- **Downloader**: Retrieves blocks and headers using peer-to-peer(P2P) networks. This tool is utilized by the pipeline during its initial two stages and by the engine in the event that it need to bridge the gap at the tip.
+- **Downloader**: Retrieves blocks and headers using peer-to-peer (P2P) networks. This tool is utilized by the pipeline during its initial two stages and by the engine in the event that it need to bridge the gap at the tip.
 - **P2P**: When we approach the tip, we transfer the transactions we have read over P2P to the transaction pool.
 - **Transaction Pool**: Includes DDoS mitigation measures. Consists of transactions arranged in ascending order based on the gas price preferred by the users.
 - **Payload Builder**: Extracts the initial n transactions in order to construct a fresh payload.
 - **Pruner**: Allows us to have a full node.Once the block has been canonicalized by the blockchain tree, we must wait for an additional 64 blocks for it to reach finalization. Once the finalization process is complete, we can be certain that the block will not undergo reorganization. Therefore, if we are operating a full node, we have the option to eliminate the old block using the pruner.
+
+## Storage
+
+Reth primarily utilizes the mdbx database. In addition, it offers several valuable abstractions that enhance its underlying database by enabling data transformation, compression, iteration, writing, and querying functionalities. These abstractions are designed to allow reth the option to change its underlying DB, mdbx, with minimal modifications to the existing storage abstractions.
+
+**Codecs**
+
+This [crate](https://github.com/paradigmxyz/reth/tree/main/crates/storage/codecs) enables the creation of diverse codecs for various purposes. The primary codec utilized in this context is the [Compact trait](https://github.com/paradigmxyz/reth/blob/6d7cd53ad25f0b79c89fd60a4db2a0f2fe097efe/crates/storage/codecs/src/lib.rs#L43), which enables the compression of data, such as unsigned integers by compressing their leading zeros, as well as structures such as access-lists, headers etc.
+
+**DB Abstractions**
+
+The [database trait](https://github.com/paradigmxyz/reth/blob/e158542d31bf576e8a6b6e61337b62f9839734cf/crates/storage/db/src/abstraction/database.rs#L12) is the fundamental abstraction that provides either read only or read/write access to transactions in the low-level database.
+
+The [cursor](https://github.com/paradigmxyz/reth/blob/e158542d31bf576e8a6b6e61337b62f9839734cf/crates/storage/db/src/abstraction/cursor.rs#L13) enables iteration over the values in the database and offers a swift method for retrieving transactions or blocks. It is particularly useful when calculating merkle roots, as sequential value access is significantly faster than random seeking. In addition, if we have a large amount of data to write, sorting and writing it is much faster. The cursor allows us to optimize our approach by providing convenient functions for writing either sorted or unsorted data.
+
+**Tables**
+
+| Table                      | Key                                 | Value                   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+| -------------------------- | ----------------------------------- | ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| CanonicalHeaders           | BlockNumber                         | HeaderHash              | Stores block number indexed by header hash                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| HeaderTerminalDifficulties | BlockNumber                         | CompactU256             | Is responsible for storing the total difficulty value obtained from a block header. Although it is commonly employed in proof-of-work systems, it is currently not in use.                                                                                                                                                                                                                                                                                                        |
+| HeaderNumbers              | BlockHash                           | BlockNumber             | This is a utility table, it stores block number associated with a header.                                                                                                                                                                                                                                                                                                                                                                                                        |
+| Headers                    | BlockNumber                         | Header                  | Stores header bodies.                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| BlockBodyIndices           | BlockNumber                         | StoredBlockBodyIndices  | Stores block indices that contains indexes of transaction and the count of them. This allows us to determine which transaction numbers are included in the block.                                                                                                                                                                                                                                                                                                                 |
+| BlockOmmers                | BlockNumber                         | StoredBlockOmmers       | Stores the uncles/ommers of the block, which are the side blocks that got included (used in proof-of-work)                                                                                                                                                                                                                                                                                                                                                                        |
+| BlockWithdrawals           | BlockNumber                         | StoredBlockWithdrawals  | Stores the block withdrawals.                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| Transactions               | TxNumber                            | TransactionSignedNoHash | Here the transaction body is stored indexed by the ordinary transaction number. This information includes the total number of transactions and the number of transactions that were executed. Furthermore, it enables us to effortlessly retrieve a solitary transaction.                                                                                                                                                                                                         |
+| TransactionHashNumbers     | TxHash                              | TxNumber                | Stores the transaction number indexed by the transaction hash.                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| TransactionBlocks          | TxNumber                            | BlockNumber             | Stores the mapping of the highest transaction number to the blocks number. Allows us to fetch the block number for a given transaction number.                                                                                                                                                                                                                                                                                                                                    |
+| Receipts                   | TxNumber                            | Receipt                 | Stores transaction receipts indexed by transaction number.                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| Bytecodes                  | B256                                | Bytecode                | Compiles and stores the bytecode of all smart contracts. There will be multiple accounts with identical bytecode. Therefore, it is necessary to implement a reference counting pointer.                                                                                                                                                                                                                                                                                           |
+| PlainAccountState          | Address                             | Account                 | Stores the current state of an [Account](https://github.com/paradigmxyz/reth/blob/fb960fb3e45e11c24125ccb4bd93f2e2e21ce271/crates/primitives/src/account.rs#L15), the plain state, indexed by the Account address. The plain state is updated during the execution stage.                                                                                                                                                                                                         |
+| PlainStorageState          | Address , SubKey = B256             | StorageEntry            | Stores the current value of a storage key and the sub-key is the hash of the storage key. Concerning sub-keys: mdbx allows us to dup table (duplicate values inside tables) which can lead a faster access to some values.                                                                                                                                                                                                                                                        |
+| AccountsHistory            | ShardedKey<Address>                 | BlockNumberList         | Stores pointers to the block changesets that contain modifications for each account key. Each account is associated with a record of modifications, represented as a list of blocks. For example, if we want to retrieve the account balance at block 1 million, we need to determine the next block where the account was modified. If the next modification occurs at block number 1 million and 1, we need to fetch the set of changes for that account from the tables below. |
+|                            |
+| StoragesHistory            | StorageShardedKey                   | BlockNumberList         | Stores pointers to block number changeset with changes for each storage key. This allows us to index the change sets and find the change that happened in the history                                                                                                                                                                                                                                                                                                              |
+| AccountChangeSets          | BlockNumber, SubKey = Address       | AccountBeforeTx         | The state of an account is stored prior to any transaction that alters it, such as when the account is created, self-destructed, accessed while empty, or when its balance or nonce is modified. Therefore, for each block number. Therefore, we possess the previous values for each block and account address.                                                                                                                                                                  |
+| StorageChangeSets          | BlockNumberAddress , SubKey = B256  | StorageEntry            | Preserves the state of a storage prior to a specific transaction altering it. Therefore, for each block number, account address and sub-key as the storage key, we can obtain the previous storage value. The execution stage modifies both this table and the one above it. These tables are used for the merkle trie calculations, which require the values to be incremental. They are also used for any history tracing performed by the JSON-RPC API.                        |
+| HashedAccounts             | B256                                | Account                 | Stores the current state of an account indexed by keccak256(Address). This table is in preparation for merkleization and calculation of state root. This and the table below are used by the merkle trie, for the first calculation of the merkle trie we need sorted hashed addresses                                                                                                                                                                                            |
+| HashedStorages             | B256, SubKey = B256                 | StorageEntry            | Stores the current storage values indexed by keccak256(Address) and the sub-key as the hash of storage key keccak256(key). Like above useful for merkleization as the hashed addresses/keys are sorted.                                                                                                                                                                                                                                                                           |
+| AccountsTrie               | StoredNibbles                       | StoredBranchNode        | Stores the current state's Merkle Patricia Tree.                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| StoragesTrie               | B256 , SubKey = StoredNibblesSubKey | StorageTrieEntry        | From HashedAddress => NibblesSubKey => Intermediate value. This and the above table stores the nodes needed for merkle trie calculation                                                                                                                                                                                                                                                                                                                                            |
+| TransactionSenders         | TxNumber                            | Address                 | Stores the transaction sender for each transaction. It is needed to speed up execution stage and allows fetching the signer without doing the computationally expensive transaction signer recovery                                                                                                                                                                                                                                                                               |
+| StageCheckpoints           | StageId                             | StageCheckpoint         | Stores the highest synced block number and stage-specific checkpoint of each stage.                                                                                                                                                                                                                                                                                                                                                                                               |
+| StageCheckpointProgresses  | StageId                             | Vec<u8>                 | Stores arbitrary data to keep track of a stage first-sync progress. This and the above table allows us to know where the stage stopped and to determine what to do next.                                                                                                                                                                                                                                                                                                          |
+| PruneCheckpoints           | PruneSegment                        | PruneCheckpoint         | Records the maximum pruned block number and the pruning mode for each segment of the pruning process. This enables us to determine the extent to which we have pruned our data, involving the elimination of change sets and their corresponding indexes to eliminate historical data, leaving only the most recent data to be retrieved i.e. fetching the tip.                                                                                                                   |
+| VersionHistory             | u64                                 | ClientVersion           | Stores the history of client versions that have accessed the database with write privileges indexed by unix timestamp seconds.                                                                                                                                                                                                                                                                                                                                                    |