chore: language and consistency edits (#55)

Co-authored-by: Markus Legner <[email protected]>
MystenLabs · Jun 17, 2024 · 575877b · 575877b
1 parent 9294921
commit 575877b
Show file tree

Hide file tree

Showing 4 changed files with 138 additions and 101 deletions.
diff --git a/docs/overview/architecture.md b/docs/overview/architecture.md
@@ -3,12 +3,12 @@
 The key actors in the Walrus architecture are the following:
 
 - **Users** through **clients** want to store and read **blobs** identified by their **blob ID**.
-  
+
   These actors are ready to pay for service
   when it comes to writes and non-best-effort reads. Users also want to prove
   the **availability** of a blob to third parties without the cost of sending or receiving the full
   blob.
-  
+
   Users might be malicious in various ways: they might not want to pay for services, prove the
   availability of unavailable blobs, modify/delete blobs without authorization, try to
   exhaust resources of storage nodes, and so on.
@@ -19,7 +19,7 @@ The key actors in the Walrus architecture are the following:
   encoded in many **slivers**. Slivers from each stored blob become part of all shards. A shard
   at any storage epoch is associated with a storage node that actually stores all slivers of
   the shard and is ready to serve them.
-  
+
   A Sui smart contract controls the assignment of storage nodes to shards within
   **storage epochs** and Walrus assumes that more than 2/3 of the
   shards are managed by correct storage nodes within each storage epoch. This means that Walrus must
@@ -37,7 +37,7 @@ permissionless way:
 
 - **Caches** are clients that store one or more full blobs and make them available to users
   over traditional web2 technologies (such as HTTP).
-  
+
   They are optional in that end users can also
   operate a local cache, and perform Walrus reads over web2 technologies locally. However, cache
   infrastructures can also act as CDNs, split the cost of blob reconstruction over many requests,
@@ -46,13 +46,13 @@ permissionless way:
 
 - **Publishers** are clients that help end users store a blob using web2 technologies,
   using less bandwidth and custom logic.
-  
+
   In effect, they receive the blob to be published over
   traditional web2 protocols (like HTTP) and run the Walrus store protocol on the end user's
   behalf. This includes encoding the blob into slivers, distributing the slivers to shards,
   collecting storage-node signatures and aggregating them into a certificate, as well as all
   other on-chain actions.
-  
+
   They are optional in that a user can directly interact with Sui and
   the storage nodes to store blobs. An end user can always verify that a publisher
   performed their duties correctly by checking that an event associated with the

diff --git a/docs/overview/encoding.md b/docs/overview/encoding.md
@@ -1,40 +1,51 @@
-# Encoding, overheads, and verification
+# Encoding, overheads and verification
 
-We summarize here the basic encoding and cryptographic techniques used in Walrus.
+The following list summarizes the basic encoding and cryptographic techniques used in Walrus:
 
-- **Storage nodes** hold one or many **shards** in a storage epoch, out of a larger total (say 1000)
-  and each shard contains one blob **sliver** for each blob past PoA. Each shard is assigned to a
-  storage node in a storage epoch.
-- An [erasure code](https://en.wikipedia.org/wiki/Online_codes) **encode algorithm** takes a blob,
+- **Storage nodes** hold one or many **shards** in a storage epoch out of a larger total (1000, for
+  instance). Each shard contains one blob **sliver** for each blob past point of availability. Each
+  shard is assigned to a storage node in a storage epoch.
+
+- An [erasure code](https://en.wikipedia.org/wiki/Online_codes) **encode algorithm** takes a blob
   and encodes it as K symbols, such that any fraction p of symbols can be used to reconstruct
   the blob. Each blob sliver contains a fixed number of such symbols.
-- We select p < 1/3 so that a third of symbols and also slivers may be used to reconstruct the blob
-  by the **decode algorithm**. The matrix used to produce the erasure code is fixed and the same
-  for all blobs by the Walrus system, and encoders have no discretion about it.
+
+- Walrus selects p < 1/3 so that a third of symbols and slivers can be used to reconstruct the blob
+  by the **decode algorithm**. The matrix used to produce the erasure code is fixed and is the same
+  for all blobs in the Walrus system, and encoders have no discretion about it.
+
 - Storage nodes manage one or more shards, and corresponding slivers of each blob are distributed
-  to all the storage shards. As a result, the overhead of the distributed store is ~5x that of
-  the blob itself, no matter how many shards we have. The encoding is systematic meaning that some
-  storage nodes hold part of the original blob, allowing for fast random access reads.
+  to all the storage shards.
+
+  As a result, the overhead of the distributed store is ~5x that of the blob itself, no matter how
+  many shards there are. The encoding is systematic, meaning that some storage nodes hold part of
+  the original blob, allowing for fast random access reads.
 
 Each blob is also associated with some metadata including a blob ID to allow verification:
 
 - A **blob ID** is computed as an authenticator of the set of all shard data and metadata (byte
-  size,
-  encoding, blob hash). We hash a sliver representation in each of the shards and add the resulting
+  size, encoding, blob hash).
+
+  Walrus hashes a sliver representation in each of the shards and adds the resulting
   hashes into a Merkle tree. Then the root of the Merkle tree is the blob hash used to derive the
   blob ID that identifies the blob in the system.
-- Each storage node may use the blob ID to check if some shard data belongs to a blob using the
+
+- Each storage node can use the blob ID to check if some shard data belongs to a blob using the
   authenticated structure corresponding to the blob hash (Merkle tree). A successful check means
-  that the data is indeed as intended by the writer of the blob (who, remember, may be corrupt).
+  that the data is indeed as intended by the writer of the blob (who might be corrupt).
+
 - When any party reconstructs a blob ID from shard slivers, or accepts any blob claiming
-  to be a specific blob ID, it must check that it encodes to the correct blob ID. This process
-  involves re-coding the blob using the erasure correction code, and re-deriving the blob ID to
-  check the blob indeed matches. This prevents a malformed blob (i.e., incorrectly erasure coded)
-  from ever being read as a valid blob at any correct recipient.
+  to be a specific blob ID, it must check that it encodes to the correct blob ID.
+
+  This process involves re-coding the blob using the erasure correction code, and deriving the
+  blob ID again to check that the blob matches. This prevents a malformed blob (incorrectly
+  erasure coded) from ever being read as a valid blob at any correct recipient.
+
 - A set of slivers equal to the reconstruction threshold belonging to a blob ID that are either
   inconsistent or lead to the reconstruction of a different ID represent an incorrect encoding
-  (this can only happen if the user that encoded the blob was malicious and encoded it incorrectly).
-  We can extract one symbol per sliver to form an inconsistency proof.
-  Storage nodes may delete slivers belonging to inconsistently encoded blobs,
+  (this happens only if the user that encoded the blob was malicious and encoded it incorrectly).
+
+  Walrus can extract one symbol per sliver to form an inconsistency proof.
+  Storage nodes can delete slivers belonging to inconsistently encoded blobs,
   and upon request return either the inconsistency proof or an inconsistency certificate posted
-  on-chain.
+  on chain.
diff --git a/docs/overview/operations-off-chain.md b/docs/overview/operations-off-chain.md
@@ -1,110 +1,125 @@
 # Off-chain operations
 
-Walrus operations happen off Sui, but may interact with the Sui flows defining the resource life
-cycle.
+While Walrus operations happen off Sui, they might interact with the blockchain flows defining the
+resource life cycle.
 
 ## Write paths
 
 ![Write paths of Walrus](../assets/WriteFlow.png)
 
-Systems overview of writes, illustrated above:
+Systems overview of writes, illustrated in the previous image:
 
-- A user acquires a storage resource of appropriate size and duration on-chain, either by directly
-  buying it on the Walrus system object, or a secondary market. A user can split, merge, and
+- A user acquires a storage resource of appropriate size and duration on chain, either by directly
+  buying it on the Walrus system object, or a secondary market. A user can split, merge and
   transfer owned storage resources.
-- When a user wants to write a blob, it first erasure codes it using encode, and computes the
+
+- When users want to write a blob, they first erasure code it using encode and compute the
   blob ID. Then they can perform the following steps themselves, or use a publisher to perform steps
   on their behalf.
+
 - The user goes on chain (Sui) and updates a storage resource to register the blob ID with the
-  desired size and lifetime. This emits an event, received by storage nodes. Once the
+  desired size and lifetime. This emits an event, received by storage nodes. After the
   user receives they then continue the upload.
+
 - The user sends each of the blob slivers and metadata to the storage nodes that currently
   manages the corresponding shards.
+
 - A storage node managing a shard receives a sliver and checks it against the blob ID.
   It also checks that there is a blob resource with the blob ID that is authorized to store
-  a blob. If correct, then the storage node signs a statement that it holds the sliver for blob ID
+  a blob. If correct, the storage node then signs a statement that it holds the sliver for blob ID
   (and metadata) and returns it to the user.
+
 - The user puts together the signatures returned from storage nodes into an availability certificate
-  and sends it on chain. When the certificate is verified on-chain an availability event for the
+  and sends it on chain. When the certificate is verified on chain, an availability event for the
   blob ID is emitted, and all other storage nodes seek to download any missing shards for the blob
-  ID. This event emitted by Sui is the [Point of Availability (PoA)](./properties.md) for the blob
+  ID. This event emitted by Sui is the [point of availability (PoA)](./properties.md) for the blob
   ID.
+
 - After the PoA, and without user involvement, storage nodes sync and recover any missing slivers.
 
-The user waits for 2/3 of shard signatures to return in order to create the certificate of
-availability. The rate of the code is below 1/3 allowing for reconstruction even if only 1/3 of
-shards return the sliver for a read. Since at most 1/3 of the storage nodes can fail, this ensures
-reconstruction if a reader requests slivers from all storage nodes. Note that the full process can
-be mediated by a publisher, that receives a blob and drives the process to completion.
+The user waits for 2/3 of shard signatures to return to create the certificate of
+availability. The rate of the code is below 1/3, allowing for reconstruction even if only 1/3 of
+shards return the sliver for a read. Because at most 1/3 of the storage nodes can fail, this ensures
+reconstruction if a reader requests slivers from all storage nodes. The full process can
+be mediated by a publisher that receives a blob and drives the process to completion.
 
 ## Refresh availability
 
-Since no content data is required to refresh the duration of storage, refresh is conducted fully on
-chain within the protocol. To request an extension to the availability of a blob, a user
-provides an appropriate storage resource. Upon success this emits an event that storage nodes
-receive to extend the time for which each sliver is stored.
+Because no content data is required to refresh the duration of storage, refresh is conducted fully
+on chain within the protocol. To request an extension to the availability of a blob, a user provides
+an appropriate storage resource. Upon success this emits an event that storage nodes receive to
+extend the time for which each sliver is stored.
 
 ## Inconsistent resource flow
 
-When a correct storage node tries to reconstruct a shard it may fail if the encoding of a blob ID
+When a correct storage node tries to reconstruct a shard it might fail if the encoding of a blob ID
 past [PoA](./properties.md) was incorrect, but will instead extract an inconsistency proof for the
-blob ID. It will then use the proof to create a inconsistency certificate and upload it on chain.
+blob ID. It then uses the proof to create an inconsistency certificate and upload it on chain.
 The flow is as follows:
 
 - A storage node fails to reconstruct a shard, and instead holds an inconsistency proof.
+
 - The storage node sends the blob ID and inconsistency proof to all storage nodes of the Walrus
-  epoch. The storage node verify the proof and sign it.
-- The storage node aggregate the signatures into an inconsistency certificate and sends it to the
-  Walrus smart contract, that verifies it and emits a inconsistent resource event.
+  epoch. The storage node verifies the proof and signs it.
+
+- The storage node aggregates the signatures into an inconsistency certificate and sends it to the
+  Walrus smart contract, which verifies it and emits a inconsistent resource event.
+
 - Upon receiving an inconsistent resource event, correct storage nodes delete sliver data for the
-  blob ID and record in the metadata to return None for the blob ID for the
+  blob ID and record in the metadata to return `None` for the blob ID for the
   [availability period](./properties.md). No storage attestation challenges are issued for this
   blob ID.
 
-Note that a blob ID that is inconsistent will always resolve to None upon reading: this is because
+A blob ID that is inconsistent always resolves to `None` upon reading: this is because
 the read process re-encodes the received blob to check that the blob ID is correctly derived from a
-consistent encoding. This means that an inconsistency proof only reveals a true fact to storage
-nodes (that do not otherwise ran decoding), and does not change the output of read in any case.
+consistent encoding. This means that an inconsistency proof reveals only a true fact to storage
+nodes (that do not otherwise run decoding), and does not change the output of read in any case.
 
-Note however that partial reads leveraging the systematic nature of the encoding may return partial
-reads for inconsistently encoded files. Thus if consistency and availability of reads is important
-dapps should do full reads rather than partial reads.
+Note, however, that partial reads leveraging the systematic nature of the encoding might return
+partial reads for inconsistently encoded files. Thus, if consistency and availability of reads is
+important, dApps should do full reads rather than partial reads.
 
 ## Read paths
 
-A user can read stored blobs either directly or through a cache. We discuss here the direct user
-journey since this is also how the cached operates in case of a cache miss. We assume that most
-reads will happen through caches, for blobs that are hot, and will not result in requests to
-storage nodes.
+A user can read stored blobs either directly or through a cache. The direct user journey is
+discussed here because this is also how the cache operates in case of a cache miss. Assume that most
+reads happen through caches for blobs that are hot, and do not result in requests to storage nodes.
 
 - The reader gets the metadata for the blob ID from any storage node, and authenticates it using
   the blob ID.
+
 - The reader then sends a request to the storage node for the shards corresponding to the blob ID,
   and waits for f+1 to respond. Sufficient requests are sent in parallel to ensure low latency for
   reads.
+
 - The reader authenticates the slivers returned with the blob ID, reconstructs the blob, and decides
   whether the contents are a valid blob or inconsistent.
+
 - Optionally, for a cache, the result is cached and can be served without reconstruction until it is
   evicted from the cache. Requests for the blob to the cache return the blob contents, or a proof
   the blob is inconsistently encoded.
 
 ## Challenge mechanism for storage attestation
 
-During an epoch a correct storage node challenges all shards to provide blob slivers past PoA:
+During an epoch, a correct storage node challenges all shards to provide blob slivers past PoA:
 
 - The list of available blobs for the epoch is determined by the sequence of Sui events up
   to the past epoch. Inconsistent blobs are not challenged, and a record proving this status
   can be returned instead.
+
 - A challenge sequence is determined by providing a seed to the challenged shard. The sequence is
   then computed based both on the seed AND the content of each challenged blob ID. This creates a
   sequential read dependency.
+
 - The response to the challenge provides the sequence of shard contents for the blob IDs in a
   timely manner.
+
 - The challenger node uses thresholds to determine whether the challenge was passed, and reports
   the result on chain.
-- The challenge / response communication is authenticated.
 
-Challenges provide some reassurance that the storage node actually can recover shard data in a
-probabilistic manner, avoiding storage nodes getting payment without any evidence they may retrieve
-shard data. The sequential nature of the challenge and some reasonable timeout also ensure that
-the process is timely.
+- The challenge/response communication is authenticated.
+
+Challenges provide some reassurance that the storage node can actually recover shard data in a
+probabilistic manner, avoiding storage nodes getting payment without any evidence they might
+retrieve shard data. The sequential nature of the challenge and some reasonable timeout also ensures
+that the process is timely.