From 023690d05e3c4e092468a0531d4c903dabbfd1eb Mon Sep 17 00:00:00 2001 From: danda Date: Tue, 16 Jul 2024 17:07:33 -0700 Subject: [PATCH 1/3] docs: danda's first read edits --- docs/src/consensus.md | 4 +- docs/src/consensus/transaction.md | 6 +-- docs/src/contributing/git-workflow.md | 16 +++--- docs/src/neptune-core/events.md | 3 +- docs/src/neptune-core/overview.md | 71 +++++++++++++++++---------- docs/src/neptune-core/syncing.md | 6 +-- 6 files changed, 60 insertions(+), 46 deletions(-) diff --git a/docs/src/consensus.md b/docs/src/consensus.md index 55a5d5fe..9e0a952d 100644 --- a/docs/src/consensus.md +++ b/docs/src/consensus.md @@ -1,10 +1,10 @@ # Consensus -Neptune achieves succinctness by requiring STARK proofs to certify most of the consensus-critical logic. As a consequence, verifying and even running a full node is cheap. The tradeoff is that someone has to produce these STARK proofs, and this burden ultimately falls on the miner. +Neptune achieves succinctness by requiring STARK proofs to certify most of the consensus-critical logic. As a consequence, verifying and even running a full node is cheap. The tradeoff is that someone has to produce these STARK proofs, and this burden ultimately falls most heavily on the miner (for aggregated block transactions) and to a lesser extent on the sender (for individual transactions). The particular proof system that Neptune uses is [Triton VM](https://triton-vm.org/). The particular computations that are proven (and verified) as part of consensus logic are documented here. -Consensus is the feature of a network whose nodes overwhelmingly agree on the current contents of a database, typically a blockchain. This database is append-only. While reorganizations can happen they are expected to be rare and shallow. Every once in a while, a new block is added. The block contains, among other things, a transaction. [Block](./consensus/block.md)s and [Transaction](./consensus/transaction.md)s are the key data objects that consensus pertains to. The *consensus logic* determines which blocks and transactions are *valid* and *confirmable*. +Consensus is the feature of a network whose nodes overwhelmingly agree on the current contents of a database, typically a blockchain. This database is append-only. While reorganizations can happen they are expected to be rare and shallow. Every once in a while, a new block is added. The block body contains a single transaction that aggregates together all inputs and outputs of individual user transactions since the previous block. [Block](./consensus/block.md)s and [Transaction](./consensus/transaction.md)s are the key data objects that consensus pertains to. The *consensus logic* determines which blocks and transactions are *valid* and *confirmable*. Note that there is a distinction between *valid* and *confirmable*. Validity refers to the internal consistency of a data object. Confirmable refers to its current relation to the rest of the blockchain. For example, having insufficient proof-of-work or including a double-spending transaction makes a block invalid. But a block can be both valid and unconfirmable, for instance if its timestamp is too far into the future. STARK proofs are capable of establishing validity but not confirmability. diff --git a/docs/src/consensus/transaction.md b/docs/src/consensus/transaction.md index 68ac10d5..537bb03a 100644 --- a/docs/src/consensus/transaction.md +++ b/docs/src/consensus/transaction.md @@ -4,7 +4,7 @@ A transaction kernel consists of the following fields: - `inputs: Vec` The commitments to the UTXOs that are consumed by this transaction. - `outputs: Vec` The commitments to the UTXOs that are generated by this transaction. - - `public_announcements: Vec` One or many strings of data to broadcased to the world. + - `public_announcements: Vec` a list of self-identifying strings broadcasted to the world. These may contain encrypted secrets but only the recipient(s) can ascertain that. - `fee: NeptuneCoins` A reward for the miner who includes this transaction in a block. - `coinbase: Option` The miner is allowed to set this field to a mining reward which is determined by various variable network parameters. - `timestamp: Timestamp` When the transaction took or takes place. @@ -15,7 +15,7 @@ A transaction kernel consists of the following fields: A transaction is *valid* if (any of): - ***a)*** it has a valid witness (including spending keys and mutator set membership proofs) - - ***b)*** it has valid proofs for each subprogram (subprograms establish things like the owners consent to this transaction, there is no inflation, etc.) + - ***b)*** it has valid proofs for each subprogram (subprograms establish things like the owners consent to this transaction, there is no inflation, etc.) - ***d)*** it has a single valid proof that the entire witness is valid (so, a multi-claim proof of all claims listed in (b)) - ***e)*** it has a single valid proof that the transaction originates from merging two valid transactions - ***f)*** it has a single valid proof that the transaction belongs to an integral mempool, *i.e.*, one to which only valid transactions were added @@ -142,7 +142,7 @@ Two transactions can be merged into one. Among other things, this operation repl ### E: Proof of Integral Mempool Operation -A transaction is valid if it was ever added to an integral mempool. The motivating use case for this feature is that mempool operators can delete transaction proofs as long as they store and routinely update one +A transaction is valid if it was ever added to an integral mempool. The motivating use case for this feature is that mempool operators can delete transaction proofs as long as they store and routinely update one An integral mempool is an MMR containing transactions *kernels*, along with a proof of integral history. The integral mempool can be updated in only one way: by appending a valid transaction. - `append : (old_mmr : Mmr) × (old_history_proof: StarkProof) × (tx : Transaction) ⟶ (new_mmr : Mmr) × (new_history_proof : StarkProof)` diff --git a/docs/src/contributing/git-workflow.md b/docs/src/contributing/git-workflow.md index 10a18728..d8a452b3 100644 --- a/docs/src/contributing/git-workflow.md +++ b/docs/src/contributing/git-workflow.md @@ -7,15 +7,13 @@ We follow a standard [GitHub Flow](https://docs.github.com/en/get-started/using- It can be visualized like this: ``` -master --------*----------------------*> - \ -------- - \ dev / topic \ - ----*----------------------*---------------> - \ release \ release - ------------------> ---------> - \ hotfix / - -------- + -------- +master / topic \ +----*----------------------*---------------> + \ release \ release + ------------------> ---------> + \ hotfix / + -------- ``` ### master branch (aka trunk) diff --git a/docs/src/neptune-core/events.md b/docs/src/neptune-core/events.md index 62b32c6f..dfb86076 100644 --- a/docs/src/neptune-core/events.md +++ b/docs/src/neptune-core/events.md @@ -1,7 +1,6 @@ # Events -The Neptune Core client can be seen as an event-driven client. Below is a list of all the events, and the messages that -these events create. +neptune-core can be seen as an event-driven program. Below is a list of all the events, and the messages that these events create. ## Events diff --git a/docs/src/neptune-core/overview.md b/docs/src/neptune-core/overview.md index d707c5c5..ac5468cc 100644 --- a/docs/src/neptune-core/overview.md +++ b/docs/src/neptune-core/overview.md @@ -1,45 +1,42 @@ # Neptune Core Overview -Neptune Core is a multi-threaded and asynchronous program using the [tokio](https://tokio.rs/tokio/tutorial) framework for concurrent primitives. It connects to other clients through TCP/IP and accepts calls to its RPC server through HTTP/JSON. Development also includes an RPC client that issues commands parsed from its command-line interface. +`neptune-core` uses the [tokio](https://tokio.rs/tokio/tutorial) async framework and tokio's multi-threaded executor which assigns tasks to threads in a threadpool and requires the use of thread synchronization primitives. We refer to spawned tokio tasks as `tasks` but you can think of them as threads if that fits your mental model better. Note that a tokio task may (or may not) run on a separate operating system thread from that task that spawned it, at tokio's discretion. -## Threads of the Neptune Core binary -There are four classes of threads: +`neptune-core` connects to other clients through TCP/IP and accepts calls to its RPC server via [tarpc](https://github.com/google/tarpc) using json serialization over the [serde_transport](https://docs.rs/tarpc/latest/tarpc/serde_transport/index.html). The project also includes `neptune-cli` a command-line client and `neptune-dashboard`, a cli/tui wallet tool. Both interact with `neptune-core` via the tarpc RPC protocol. + +## Long-lived async tasks of neptune-core binary +There are four classes of tasks: - `main`: handles init and `main_loop` - `peer[]`: handles `connect_to_peers` and `peer_loop` -- `mining`: runs `miner_loop`, has a worker and a monitor thread +- `mining`: runs `miner_loop`, has a worker and a monitor task - `rpc_server[]`: handles `rpc_server` for incoming RPC requests -## Threads of the RPC client binary, the CLI interface -This is a separate program all together with a separate address space. This means the `state` object (see further down) is not available, and all data from Neptune Core must be received via RPC. -It only has one class of threads: -- `rpc_cli[]`: handles `rpc_cli` for parsing user-supplied command-line arguments and transforms them into outgoing RPC requests. - ## Channels -The threads can communicate with each other through channels provided by the tokio framework. All communication goes through the main thread. There is e.g. no way for the miner to communicate with peer threads. +Long-lived tasks can communicate with each other through channels provided by the tokio framework. All communication goes through the main task. Eg, there is no way for the miner task to communicate with peer tasks. The channels are: - peer to main: `mpsc`, "multiple producer, single consumer". -- main to peer: `broadcast`, messages can only be sent to *all* peer threads. If you only want one peer thread to act, the message must include an IP that represents the peer for which the action is intended. -- miner to main: `mpsc`. Only one miner thread (the monitor/master thread) sends messages to main. Used to tell the main loop about newly found blocks. +- main to peer: `broadcast`, messages can only be sent to *all* peer tasks. If you only want one peer task to act, the message must include an IP that represents the peer for which the action is intended. +- miner to main: `mpsc`. Only one miner task (the monitor/master task) sends messages to main. Used to tell the main loop about newly found blocks. - main to miner: `watch`. Used to tell the miner to mine on top of a new block; to shut down; or that the mempool has been updated, and that it therefore is safe to mine on the next block. -- rpc server to main: `mpsc`: Used to e.g. send a transaction object that is built from client-controlled UTXOs to the main thread where it can be added to the mempool. This channel is also used to shut down the program when the `shutdown` command is called. +- rpc server to main: `mpsc`: Used to e.g. send a transaction object that is built from client-controlled UTXOs to the main task where it can be added to the mempool. This channel is also used to shut down the program when the `shutdown` command is called. ## Global State -All threads that are part of Neptune Core have access to the global state and they can all read from it. Each type of thread can have its own local state that is not shared across threads, this is **not** what is discussed here. +All tasks that are part of Neptune Core have access to the global state and they can all read from it. Each type of task can have its own local state that is not shared across tasks, this is **not** what is discussed here. -The global state has five fields and they each follow some rules and a canonical ordering of these fields exists: +The global state has five fields and they each follow some rules: - `wallet_state` contains information necessary to generate new transactions and print the user's balance. -- `chain` Blockchain state. Contains information about state of the blockchain, block height, digest of latest block etc. Only `main` thread may update `chain`. `chain` consists of two field: +- `chain` Blockchain state. Contains information about state of the blockchain, block height, digest of latest block etc. Only `main` task may update `chain`. `chain` consists of two field: - `light_state`, ephemeral, contains only latest block - `archival_state`, persistent. `archival_state` consists of data stored both in a database and on disk. The blocks themselves are stored on disk, and meta-information about the blocks are stored in the `block_index` database. `archival_state` also contains the `archival_mutator_set` which can be used to recover unsynced membership proofs for the mutator set. -- `network`, network state. Consists of `peer_map` for storing in memory info about all connected peers and `peer_databases` for persisting info about banned peers. Both of these can be written to by main or by peer threads. `network` also contains a `syncing` value (only `main` may write) and `instance_id` which is read-only. +- `network`, network state. Consists of `peer_map` for storing in memory info about all connected peers and `peer_databases` for persisting info about banned peers. Both of these can be written to by main or by peer tasks. `network` also contains a `syncing` value (only `main` may write) and `instance_id` which is read-only. - `cli` CLI arguments. The state carries around the CLI arguments. These are read-only. -- `mempool`, in-memory data structure of a set of transactions that have not yet been mined in a block. The miner reads from the `mempool` to find the most valuable transactions to mine. Only the main thread may write to `mempool`. `mempool` comes with a concept of ordering such that only the transactions that pay the highest fee per size are remembered. `mempool` enforces a max size such that its size can be constrained. +- `mempool`, in-memory data structure of a set of transactions that have not yet been mined in a block. The miner reads from the `mempool` to find the most valuable transactions to mine. Only the main task may write to `mempool`. `mempool` comes with a concept of ordering such that only the transactions that pay the highest fee per size are remembered. `mempool` enforces a max size such that its size can be constrained. ## Receiving a New Block -When a new block is received from a peer, it is first validated by the peer thread. If the block is valid and more canonical than the current tip, it is sent to the main thread. The main thread is responsible for updating the `GlobalState` data structure to reflect the new block. This is done by acquiring all relevant locks in the correct order and then calling the respective helper functions with this lock held throughout the updating process. +When a new block is received from a peer, it is first validated by the peer task. If the block is valid and more canonical than the current tip, it is sent to the main task. The main task is responsible for updating the `GlobalState` data structure to reflect the new block. This is done by write-acquiring the single `GlobalStateLock` and then calling the respective helper functions with this lock held throughout the updating process. -There are two pieces of code in the main loop that update the state with a new block: one when new blocks are received from a peer, and one for when the block is found locally by the miner thread. These two functionalities are somewhat similar. In this process all databases are flushed to ensure that the changes are persisted on disk. +There are two pieces of code in the main loop that update the state with a new block: one when new blocks are received from a peer, and one for when the block is found locally by the miner task. These two functionalities are somewhat similar. In this process all databases are flushed to ensure that the changes are persisted on disk. The individual steps of updating the global state with a new block are: 0.   @@ -52,12 +49,14 @@ The individual steps of updating the global state with a new block are: 5. Update `light_state` with the latest block. 6. Flush all databases 7. Tell miner - - If block was found locally: Tell miner that it can start working on next block since the `mempool` has now been updated with the latest block. + - If block was found locally: Tell miner that it can start working on next block since the `mempool` has now been updated with the latest block. - If blocks were received from peer: Tell miner to start building on top of a new chain tip. ## Spending UTXOs A transaction that spends UTXOs managed by the client can be made by calling the `create_transaction` method on the `GlobalState` instance. This function needs a synced `wallet_db` and a chain tip in `light_state` to produce a valid transaction. +For a working example, see the implementation of the `send_to_many()` RPC method. + ## Scheduled Tasks in Main Loop Different tasks are scheduled in the main loop every N seconds. These currently handle: peer discovery, block (batch) synchronization, and mempoool cleanup. - Peer discovery: This is used to find new peers to connect to. The logic attempts to find peers that have a distance bigger than 2 in the network where distance 0 is defined as yourself; distance 1 are the peers you connect to at start up, and all incoming connections; distance 2 are your peers' peers and so on. @@ -67,25 +66,38 @@ Different tasks are scheduled in the main loop every N seconds. These currently A task for recovering unsynced membership proofs would fit well in here. ## Design Philosophies -- Avoid state-through-instruction-pointer. This means that a request/response exchange should be handled without nesting of e.g. matched messages from another peer. So when a peer thread requests a block from another peer the peer thread must return to the instruction pointer where it can receive *any* message from the peer and not only work if it actually gets the block as the next message. The reasoning behind this is that a peer thread must be able to respond to e.g. a peer discovery request message from the same peer before that peer responds with the requested block. +- Avoid state-through-instruction-pointer. This means that a request/response exchange should be handled without nesting of e.g. matched messages from another peer. So when a peer task requests a block from another peer the peer task must return to the instruction pointer where it can receive *any* message from the peer and not only work if it actually gets the block as the next message. The reasoning behind this is that a peer task must be able to respond to e.g. a peer discovery request message from the same peer before that peer responds with the requested block. ## Central Primitives From `tokio` - `spawn` - `select!` -- `tokio::sync::Mutex` +- `tokio::sync::RwLock` From Std lib: - `Arc` -- `std::sync::Mutex` + +From neptune-core: +- `neptune_core::locks::tokio::AtomicRw` (wraps `Arc`) ## Persistent Memory -We use `rusty-leveldb` for our database layer with a custom-wrapper that makes it more type safe. `rusty-leveldb` allows for atomic writes within *one* database which is equivalent to a table in SQL lingo. So if you want atomic writes across multiple datatypes (you do want this!) you need to put that `enum` into the database and then cast the output type to the correct type. I think this is a low price to pay to achieve atomicity on the DB-layer. -Blocks are stored on disk and their position on disk is stored in the `block_index` database. Blocks are read from and written to disk using `mmap`. +We use `leveldb` for our database layer with custom wrappers that make it more async-friendly, type safe, and emulate multi-table transactions. + +`neptune_core::database::NeptuneLevelDb` provides async wrappers for leveldb APIs to avoid blocking async tasks. + +`leveldb` is a simple key/value store, meaning it only allows manipulating individual strings. It does however provide a batch update facility. `neptune_core::database::storage::storage_schema::DbSchema` leverages these batch updates to provide vector and singleton types that can be manipulated in rust code and then atomically written to `leveldb` as a single batch update (aka transaction). + +Blocks are stored on disk and their position on disk is stored in the `block_index` database. Blocks are read from and written to disk using `mmap`. We wrap all file-system calls with tokio's `spawn_blocking()` so they will not block other async tasks. ## Challenges -- Deadlocks. Solution: always acquire locks in the same order. Note though that locks from `std::sync` may not be held over an `await`. The linter should tell you if you do this. When a function requires more than one lock, **the only correct ordering in which to acquire these locks is the order in which the fields are defined in `GlobalState`. Any deviation from this is a bug.** + +- Deadlocks. We only have a single RwLock over the GlobalState. This is encapsulated in struct `GlobalStateLock`. This makes deadlocks pretty easy to avoid, following some simple rules: + + 1. avoid deadlocking yourself. If a function has read-acquired the global lock then it must be released before write-acquiring. Likewise never attempt to write-acquire the lock twice. + + 2. avoid deadlocking others. Always be certain that the global lock will be released in timely fashion. In other words if you have some kind of long running task with an event loop that needs to acquire the global lock, ensure that it gets acquired+released inside the loop rather than outside. + - Atomic writing to databases: The archival mutator set is spread across multiple databases due to how the underlying data structures are defined. If one of the databases are updated but the other is not, this will leave the archival mutator set in an invalid state. We could fix this by allowing an archival mutator set to be stored in only one database. We should also add logic to rebuild the archival mutator set state from the `block_index` database and the blocks stored on disk since it can be derived from the blocks. This functionality could be contained in a separate binary, just like we have a binary for the CLI interface in the form of the RPC client. ## Tracing @@ -120,3 +132,8 @@ XDG_DATA_HOME=~/.local/share/neptune-integration-test/0/ RUST_LOG=trace cargo ru Note that the client exists quickly, so here the `.pretty()` tracing subscriber is suitable, while `.compact()` is perhaps better for the server. +# neptune-cli client + +`neptune-cli` is a separate program with a separate address space. This means the `state` object (see further down) is not available, and all data from Neptune Core must be received via RPC. + +`neptune-cli` does not have any long-lived tasks but rather receives individual commands via CLI, sends a query to neptune-core, presents the response, and exits. diff --git a/docs/src/neptune-core/syncing.md b/docs/src/neptune-core/syncing.md index f0bc74fd..bf52675a 100644 --- a/docs/src/neptune-core/syncing.md +++ b/docs/src/neptune-core/syncing.md @@ -8,13 +8,13 @@ Synchronization describes the state that a blockchain client can be in. Synchronization is motivated by the way that regular block downloading happens. If a client receives a new block from a peer, the client checks if it knows the parent of this block. If it does not know the parent, then -the client request the parent from the peer. If this parent block is also not known, it requests the parent -of that and so on. In this process all blocks are received in oppossite order from which they are mined, and +the client requests the parent from the peer. If this parent block is also not known, it requests the parent +of that and so on. In this process all blocks are received in opposite order from which they are mined, and the blocks whose parents are not known are kept in memory. To avoid overflowing the memory if thousands of blocks were to be fetched this way, synchronization was built. When synchronization is active, the blocks are fetched in sequential order, from oldest to newest block. -The state that is used to manage synchronization is stored the main thread, the thread that runs at +State that is used to manage synchronization is stored in the main thread which runs at startup. This thread ends up in `main_loop.rs` and stays there until program shutdown. The `MutableMainLoopState` currently consists of two fields: A state to handle peer discovery and a state to From 611ce354182e9e6fe759066b726fad2983527b94 Mon Sep 17 00:00:00 2001 From: danda Date: Wed, 17 Jul 2024 10:44:07 -0700 Subject: [PATCH 2/3] docs: rewrite "Atomic writing to databases" --- docs/src/neptune-core/overview.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/src/neptune-core/overview.md b/docs/src/neptune-core/overview.md index ac5468cc..ce56b139 100644 --- a/docs/src/neptune-core/overview.md +++ b/docs/src/neptune-core/overview.md @@ -98,7 +98,9 @@ Blocks are stored on disk and their position on disk is stored in the `block_ind 2. avoid deadlocking others. Always be certain that the global lock will be released in timely fashion. In other words if you have some kind of long running task with an event loop that needs to acquire the global lock, ensure that it gets acquired+released inside the loop rather than outside. -- Atomic writing to databases: The archival mutator set is spread across multiple databases due to how the underlying data structures are defined. If one of the databases are updated but the other is not, this will leave the archival mutator set in an invalid state. We could fix this by allowing an archival mutator set to be stored in only one database. We should also add logic to rebuild the archival mutator set state from the `block_index` database and the blocks stored on disk since it can be derived from the blocks. This functionality could be contained in a separate binary, just like we have a binary for the CLI interface in the form of the RPC client. +- Atomic writing to databases: `neptune-core` presently writes to the following databases: wallet_db, block_index_db, archival_mutator_set, peer_state. If one of the databases are updated but the other is not, this can leave data in an invalid state. We could fix this by storing all state in a single transactional database. + +note: We should also add logic to rebuild the archival state from the `block_index_db` and the blocks stored on disk since it can be derived from the blocks. This functionality could be contained in a separate binary, just like we have a binary for the CLI interface in the form of the RPC client. ## Tracing A structured way of inspecting a program when designing the RPC API, is to use tracing, which is a logger, that is suitable for programs with asynchronous control flow. From 842e7903394c386ebb449d3007ce08f3baedef7b Mon Sep 17 00:00:00 2001 From: danda Date: Thu, 18 Jul 2024 11:59:03 -0700 Subject: [PATCH 3/3] docs: clarify "Atomic writing to databases" co-authored with sword-smith --- docs/src/neptune-core/overview.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/src/neptune-core/overview.md b/docs/src/neptune-core/overview.md index ce56b139..d631507c 100644 --- a/docs/src/neptune-core/overview.md +++ b/docs/src/neptune-core/overview.md @@ -98,9 +98,9 @@ Blocks are stored on disk and their position on disk is stored in the `block_ind 2. avoid deadlocking others. Always be certain that the global lock will be released in timely fashion. In other words if you have some kind of long running task with an event loop that needs to acquire the global lock, ensure that it gets acquired+released inside the loop rather than outside. -- Atomic writing to databases: `neptune-core` presently writes to the following databases: wallet_db, block_index_db, archival_mutator_set, peer_state. If one of the databases are updated but the other is not, this can leave data in an invalid state. We could fix this by storing all state in a single transactional database. +- Atomic writing to databases: `neptune-core` presently writes to the following databases: wallet_db, block_index_db, archival_mutator_set, peer_state. If one of the databases are updated but the other is not, this can leave data in an invalid state. We could fix this by storing all state in a single transactional database but that might make the code base less modular. -note: We should also add logic to rebuild the archival state from the `block_index_db` and the blocks stored on disk since it can be derived from the blocks. This functionality could be contained in a separate binary, just like we have a binary for the CLI interface in the form of the RPC client. +note: We should also add logic to rebuild the archival state from the `block_index_db` and the blocks stored on disk since it can be derived from the blocks. This functionality could be contained in a separate binary or a check could be performed at startup. ## Tracing A structured way of inspecting a program when designing the RPC API, is to use tracing, which is a logger, that is suitable for programs with asynchronous control flow.