-
Notifications
You must be signed in to change notification settings - Fork 58
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore: language and consistency edits (#55)
Co-authored-by: Markus Legner <[email protected]>
- Loading branch information
1 parent
9294921
commit 575877b
Showing
4 changed files
with
138 additions
and
101 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,40 +1,51 @@ | ||
# Encoding, overheads, and verification | ||
# Encoding, overheads and verification | ||
|
||
We summarize here the basic encoding and cryptographic techniques used in Walrus. | ||
The following list summarizes the basic encoding and cryptographic techniques used in Walrus: | ||
|
||
- **Storage nodes** hold one or many **shards** in a storage epoch, out of a larger total (say 1000) | ||
and each shard contains one blob **sliver** for each blob past PoA. Each shard is assigned to a | ||
storage node in a storage epoch. | ||
- An [erasure code](https://en.wikipedia.org/wiki/Online_codes) **encode algorithm** takes a blob, | ||
- **Storage nodes** hold one or many **shards** in a storage epoch out of a larger total (1000, for | ||
instance). Each shard contains one blob **sliver** for each blob past point of availability. Each | ||
shard is assigned to a storage node in a storage epoch. | ||
|
||
- An [erasure code](https://en.wikipedia.org/wiki/Online_codes) **encode algorithm** takes a blob | ||
and encodes it as K symbols, such that any fraction p of symbols can be used to reconstruct | ||
the blob. Each blob sliver contains a fixed number of such symbols. | ||
- We select p < 1/3 so that a third of symbols and also slivers may be used to reconstruct the blob | ||
by the **decode algorithm**. The matrix used to produce the erasure code is fixed and the same | ||
for all blobs by the Walrus system, and encoders have no discretion about it. | ||
|
||
- Walrus selects p < 1/3 so that a third of symbols and slivers can be used to reconstruct the blob | ||
by the **decode algorithm**. The matrix used to produce the erasure code is fixed and is the same | ||
for all blobs in the Walrus system, and encoders have no discretion about it. | ||
|
||
- Storage nodes manage one or more shards, and corresponding slivers of each blob are distributed | ||
to all the storage shards. As a result, the overhead of the distributed store is ~5x that of | ||
the blob itself, no matter how many shards we have. The encoding is systematic meaning that some | ||
storage nodes hold part of the original blob, allowing for fast random access reads. | ||
to all the storage shards. | ||
|
||
As a result, the overhead of the distributed store is ~5x that of the blob itself, no matter how | ||
many shards there are. The encoding is systematic, meaning that some storage nodes hold part of | ||
the original blob, allowing for fast random access reads. | ||
|
||
Each blob is also associated with some metadata including a blob ID to allow verification: | ||
|
||
- A **blob ID** is computed as an authenticator of the set of all shard data and metadata (byte | ||
size, | ||
encoding, blob hash). We hash a sliver representation in each of the shards and add the resulting | ||
size, encoding, blob hash). | ||
|
||
Walrus hashes a sliver representation in each of the shards and adds the resulting | ||
hashes into a Merkle tree. Then the root of the Merkle tree is the blob hash used to derive the | ||
blob ID that identifies the blob in the system. | ||
- Each storage node may use the blob ID to check if some shard data belongs to a blob using the | ||
|
||
- Each storage node can use the blob ID to check if some shard data belongs to a blob using the | ||
authenticated structure corresponding to the blob hash (Merkle tree). A successful check means | ||
that the data is indeed as intended by the writer of the blob (who, remember, may be corrupt). | ||
that the data is indeed as intended by the writer of the blob (who might be corrupt). | ||
|
||
- When any party reconstructs a blob ID from shard slivers, or accepts any blob claiming | ||
to be a specific blob ID, it must check that it encodes to the correct blob ID. This process | ||
involves re-coding the blob using the erasure correction code, and re-deriving the blob ID to | ||
check the blob indeed matches. This prevents a malformed blob (i.e., incorrectly erasure coded) | ||
from ever being read as a valid blob at any correct recipient. | ||
to be a specific blob ID, it must check that it encodes to the correct blob ID. | ||
|
||
This process involves re-coding the blob using the erasure correction code, and deriving the | ||
blob ID again to check that the blob matches. This prevents a malformed blob (incorrectly | ||
erasure coded) from ever being read as a valid blob at any correct recipient. | ||
|
||
- A set of slivers equal to the reconstruction threshold belonging to a blob ID that are either | ||
inconsistent or lead to the reconstruction of a different ID represent an incorrect encoding | ||
(this can only happen if the user that encoded the blob was malicious and encoded it incorrectly). | ||
We can extract one symbol per sliver to form an inconsistency proof. | ||
Storage nodes may delete slivers belonging to inconsistently encoded blobs, | ||
(this happens only if the user that encoded the blob was malicious and encoded it incorrectly). | ||
|
||
Walrus can extract one symbol per sliver to form an inconsistency proof. | ||
Storage nodes can delete slivers belonging to inconsistently encoded blobs, | ||
and upon request return either the inconsistency proof or an inconsistency certificate posted | ||
on-chain. | ||
on chain. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,110 +1,125 @@ | ||
# Off-chain operations | ||
|
||
Walrus operations happen off Sui, but may interact with the Sui flows defining the resource life | ||
cycle. | ||
While Walrus operations happen off Sui, they might interact with the blockchain flows defining the | ||
resource life cycle. | ||
|
||
## Write paths | ||
|
||
![Write paths of Walrus](../assets/WriteFlow.png) | ||
|
||
Systems overview of writes, illustrated above: | ||
Systems overview of writes, illustrated in the previous image: | ||
|
||
- A user acquires a storage resource of appropriate size and duration on-chain, either by directly | ||
buying it on the Walrus system object, or a secondary market. A user can split, merge, and | ||
- A user acquires a storage resource of appropriate size and duration on chain, either by directly | ||
buying it on the Walrus system object, or a secondary market. A user can split, merge and | ||
transfer owned storage resources. | ||
- When a user wants to write a blob, it first erasure codes it using encode, and computes the | ||
|
||
- When users want to write a blob, they first erasure code it using encode and compute the | ||
blob ID. Then they can perform the following steps themselves, or use a publisher to perform steps | ||
on their behalf. | ||
|
||
- The user goes on chain (Sui) and updates a storage resource to register the blob ID with the | ||
desired size and lifetime. This emits an event, received by storage nodes. Once the | ||
desired size and lifetime. This emits an event, received by storage nodes. After the | ||
user receives they then continue the upload. | ||
|
||
- The user sends each of the blob slivers and metadata to the storage nodes that currently | ||
manages the corresponding shards. | ||
|
||
- A storage node managing a shard receives a sliver and checks it against the blob ID. | ||
It also checks that there is a blob resource with the blob ID that is authorized to store | ||
a blob. If correct, then the storage node signs a statement that it holds the sliver for blob ID | ||
a blob. If correct, the storage node then signs a statement that it holds the sliver for blob ID | ||
(and metadata) and returns it to the user. | ||
|
||
- The user puts together the signatures returned from storage nodes into an availability certificate | ||
and sends it on chain. When the certificate is verified on-chain an availability event for the | ||
and sends it on chain. When the certificate is verified on chain, an availability event for the | ||
blob ID is emitted, and all other storage nodes seek to download any missing shards for the blob | ||
ID. This event emitted by Sui is the [Point of Availability (PoA)](./properties.md) for the blob | ||
ID. This event emitted by Sui is the [point of availability (PoA)](./properties.md) for the blob | ||
ID. | ||
|
||
- After the PoA, and without user involvement, storage nodes sync and recover any missing slivers. | ||
|
||
The user waits for 2/3 of shard signatures to return in order to create the certificate of | ||
availability. The rate of the code is below 1/3 allowing for reconstruction even if only 1/3 of | ||
shards return the sliver for a read. Since at most 1/3 of the storage nodes can fail, this ensures | ||
reconstruction if a reader requests slivers from all storage nodes. Note that the full process can | ||
be mediated by a publisher, that receives a blob and drives the process to completion. | ||
The user waits for 2/3 of shard signatures to return to create the certificate of | ||
availability. The rate of the code is below 1/3, allowing for reconstruction even if only 1/3 of | ||
shards return the sliver for a read. Because at most 1/3 of the storage nodes can fail, this ensures | ||
reconstruction if a reader requests slivers from all storage nodes. The full process can | ||
be mediated by a publisher that receives a blob and drives the process to completion. | ||
|
||
## Refresh availability | ||
|
||
Since no content data is required to refresh the duration of storage, refresh is conducted fully on | ||
chain within the protocol. To request an extension to the availability of a blob, a user | ||
provides an appropriate storage resource. Upon success this emits an event that storage nodes | ||
receive to extend the time for which each sliver is stored. | ||
Because no content data is required to refresh the duration of storage, refresh is conducted fully | ||
on chain within the protocol. To request an extension to the availability of a blob, a user provides | ||
an appropriate storage resource. Upon success this emits an event that storage nodes receive to | ||
extend the time for which each sliver is stored. | ||
|
||
## Inconsistent resource flow | ||
|
||
When a correct storage node tries to reconstruct a shard it may fail if the encoding of a blob ID | ||
When a correct storage node tries to reconstruct a shard it might fail if the encoding of a blob ID | ||
past [PoA](./properties.md) was incorrect, but will instead extract an inconsistency proof for the | ||
blob ID. It will then use the proof to create a inconsistency certificate and upload it on chain. | ||
blob ID. It then uses the proof to create an inconsistency certificate and upload it on chain. | ||
The flow is as follows: | ||
|
||
- A storage node fails to reconstruct a shard, and instead holds an inconsistency proof. | ||
|
||
- The storage node sends the blob ID and inconsistency proof to all storage nodes of the Walrus | ||
epoch. The storage node verify the proof and sign it. | ||
- The storage node aggregate the signatures into an inconsistency certificate and sends it to the | ||
Walrus smart contract, that verifies it and emits a inconsistent resource event. | ||
epoch. The storage node verifies the proof and signs it. | ||
|
||
- The storage node aggregates the signatures into an inconsistency certificate and sends it to the | ||
Walrus smart contract, which verifies it and emits a inconsistent resource event. | ||
|
||
- Upon receiving an inconsistent resource event, correct storage nodes delete sliver data for the | ||
blob ID and record in the metadata to return None for the blob ID for the | ||
blob ID and record in the metadata to return `None` for the blob ID for the | ||
[availability period](./properties.md). No storage attestation challenges are issued for this | ||
blob ID. | ||
|
||
Note that a blob ID that is inconsistent will always resolve to None upon reading: this is because | ||
A blob ID that is inconsistent always resolves to `None` upon reading: this is because | ||
the read process re-encodes the received blob to check that the blob ID is correctly derived from a | ||
consistent encoding. This means that an inconsistency proof only reveals a true fact to storage | ||
nodes (that do not otherwise ran decoding), and does not change the output of read in any case. | ||
consistent encoding. This means that an inconsistency proof reveals only a true fact to storage | ||
nodes (that do not otherwise run decoding), and does not change the output of read in any case. | ||
|
||
Note however that partial reads leveraging the systematic nature of the encoding may return partial | ||
reads for inconsistently encoded files. Thus if consistency and availability of reads is important | ||
dapps should do full reads rather than partial reads. | ||
Note, however, that partial reads leveraging the systematic nature of the encoding might return | ||
partial reads for inconsistently encoded files. Thus, if consistency and availability of reads is | ||
important, dApps should do full reads rather than partial reads. | ||
|
||
## Read paths | ||
|
||
A user can read stored blobs either directly or through a cache. We discuss here the direct user | ||
journey since this is also how the cached operates in case of a cache miss. We assume that most | ||
reads will happen through caches, for blobs that are hot, and will not result in requests to | ||
storage nodes. | ||
A user can read stored blobs either directly or through a cache. The direct user journey is | ||
discussed here because this is also how the cache operates in case of a cache miss. Assume that most | ||
reads happen through caches for blobs that are hot, and do not result in requests to storage nodes. | ||
|
||
- The reader gets the metadata for the blob ID from any storage node, and authenticates it using | ||
the blob ID. | ||
|
||
- The reader then sends a request to the storage node for the shards corresponding to the blob ID, | ||
and waits for f+1 to respond. Sufficient requests are sent in parallel to ensure low latency for | ||
reads. | ||
|
||
- The reader authenticates the slivers returned with the blob ID, reconstructs the blob, and decides | ||
whether the contents are a valid blob or inconsistent. | ||
|
||
- Optionally, for a cache, the result is cached and can be served without reconstruction until it is | ||
evicted from the cache. Requests for the blob to the cache return the blob contents, or a proof | ||
the blob is inconsistently encoded. | ||
|
||
## Challenge mechanism for storage attestation | ||
|
||
During an epoch a correct storage node challenges all shards to provide blob slivers past PoA: | ||
During an epoch, a correct storage node challenges all shards to provide blob slivers past PoA: | ||
|
||
- The list of available blobs for the epoch is determined by the sequence of Sui events up | ||
to the past epoch. Inconsistent blobs are not challenged, and a record proving this status | ||
can be returned instead. | ||
|
||
- A challenge sequence is determined by providing a seed to the challenged shard. The sequence is | ||
then computed based both on the seed AND the content of each challenged blob ID. This creates a | ||
sequential read dependency. | ||
|
||
- The response to the challenge provides the sequence of shard contents for the blob IDs in a | ||
timely manner. | ||
|
||
- The challenger node uses thresholds to determine whether the challenge was passed, and reports | ||
the result on chain. | ||
- The challenge / response communication is authenticated. | ||
|
||
Challenges provide some reassurance that the storage node actually can recover shard data in a | ||
probabilistic manner, avoiding storage nodes getting payment without any evidence they may retrieve | ||
shard data. The sequential nature of the challenge and some reasonable timeout also ensure that | ||
the process is timely. | ||
- The challenge/response communication is authenticated. | ||
|
||
Challenges provide some reassurance that the storage node can actually recover shard data in a | ||
probabilistic manner, avoiding storage nodes getting payment without any evidence they might | ||
retrieve shard data. The sequential nature of the challenge and some reasonable timeout also ensures | ||
that the process is timely. |
Oops, something went wrong.