Skip to content

TzStamp Proof Protocol

Felipe Cardozo edited this page Oct 26, 2021 · 1 revision

Welcome to the tzstamp wiki!

Introduction

The solution we propose begins with a timestamp server. A timestamp server works by taking a hash of a block of items to be timestamped and widely publishing the hash, such as in a newspaper or Usenet post [2-5]. The timestamp proves that the data must have existed at the time, obviously, in order to get into the hash. Each timestamp includes the previous timestamp in its hash, forming a chain, with each additional timestamp reinforcing the ones before it.

  — Satoshi Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System

In order for an electronic currency system to work it must solve double spending. As the name implies double spending is when the same money is used to pay for two transactions. This occurs when there is imperfect consensus on what transactions have occurred. For example:

  1. Joe purchases something from Discount Tires
  2. Joe then immediately purchases something else from Crazy Circuits before the network updates
  3. Discount Tires and Crazy Circuits both post their transaction to the network
  4. Joe's wallet only covers the cost of one transaction which both merchants have claims to
  5. Joe has already run off with the merchandise, so it can't be repossessed

Now a financial mess has been made for Discount Tires and Crazy Circuits, with no clear way to resolve it. Therefore any digital currency system has to normalize the order and time that transactions are considered to have taken place. Bitcoin, Ethereum, and of course Tezos all do this. Because these currencies allow memos in their transactions, it is possible to encode information such as a SHA-256 hash into a public ledger. This provides unassailable evidence that the information identified by the hash existed on or before the date of the transaction.

Use of Bitcoin in this way has historically been controversial, but smart contract platforms like Tezos are designed for it. If all applications of this technology had to pay a transaction fee for each piece of information they timestamped, it would have a handful of niche uses and be otherwise unremarkable.

Thankfully with some clever restructuring of the data it is possible to make a single hash go much farther.

Previous Work

TzStamp is closest to the design of OpenTimestamps. The first solutions to extend blockchain timestamping took a public list of hashes from users and uploaded the hash of the list. This had the downside that it forced anyone who wanted to verify a timestamp to have access to the entire list. As the lists grew longer this became de-facto centralization. OpenTimestamps avoids this by using a Merkle tree structure to store its hashes. A Merkle tree works by starting with a base layer of items called "leaves". A tree is then constructed by dividing the leaves into pairs and hashing them, halving the number of items. This process is repeated until there is a single hash left called the Merkle root. The advantage of the Merkle tree is that it reduces the number of hashes required to verify an item is a member of the tree to the logarithm of the number of leaves. This means each item can have the proof of its timestamp reduced to kilobytes even for very large sets of leaves. Therefore it becomes trivial for an end user to store the information necessary to prove their timestamp.

TzStamp Protocol

TzStamp uses this same Merkle tree approach to its timestamps. It starts by accepting many hashes from users and then aggregating these in its internal Merkle tree. On a schedule, TzStamp derives a Merkle root and posts it in a transaction to the Tezos blockchain. Then to construct timestamp proofs, two components are compiled by TzStamp. One is a "high proof" showing that the Merkle root was included in a block. This proof makes sure that it is possible to verify a TzStamp proof even if indexers purge their transaction history. The high proof is then concatenated with the "low proofs" generated for each leaf in the Merkle tree. A low proof consists of the minimum number of tree nodes necessary to show that the data is a member of the leaves identified by the Merkle root.

The proof format is a text file consisting of a series of append, prepend, hash, etc operations. An "operation" is a pure function taking its arguments either from the proof file or the result of the previous operation. It is possible to construct partial low proofs which can be upgraded to a complete proof once their associated block is published and a high proof can be created. A complete proof should include a final attestation operation which produces a particular block hash on a Tezos network. In the following explanation Tezos network operations will be referred to as network-operations and TzStamp proof operations will be referred to as proof-operations.

Because this procedure is in effect chaining the local TzStamp Merkle root with the on-chain Merkle trees used by TzStamp, we will refer to the Merkle root published by TzStamp as the "aggregator root".

Constructing The High Proof

At publication time, a network-operation group is locally constructed and signed by TzStamp. The aggregator root appears at the parameters.value.bytes field of the transaction-kind operation within the group. The operation is sent to the service's configured node for inclusion. Once confirmed, the JSON serialization of the full block is fetched.

An operation-less proof is instantiated, taking the aggregator root as input. With no modifying proof-operations, its derivation is the aggregator root. We shall call this proof the "high proof".

From Aggregator Root To Operation Hash

The Tezos network-operation used to publish the aggregator root is located within the operations lists and extracted. The JSON network-operation is converted into Micheline binary using some custom code. Some fields within the operation group such as its own hash are not part of the binary originally sent to the node for inclusion and are ignored. The operation hash of an network-operation group is the BLAKE2b 32-byte digest of the Micheline binary; so the bytes before and after the aggregator root bytes are embedded in a Join proof-operation, followed by a BLAKE2b proof-operation, which are concatenated to the high proof. The derivation of the high proof is now the network-operation hash.

From Operation Hash To Operation(s) Hash

The current network shell uses a four-pass multivalidation schema. Management network-operation groups, including transaction-kind operations, will always appear in the fourth pass. The hash of each pass is the Tezos-style Merkle root containing each operation group in the pass. Tezos-style trees use the BLAKE2b 32-byte digest hash function and implicitly repeat the last leaf until the tree is perfect. If the pass has no operations, the root is the BLAKE2b 32-byte digest of null input.

All network-operation hashes are extracted and built into four Tezos-style Merkle trees, one for each pass. The operation hash of the network-operation used to publish the aggregator root is located within the fourth tree and the join-hash walk from leaf to root is added to the proof. The nodes of the other three pass trees are unneeded. The roots of the pass Merkle trees are then used to build another Tezos-style Merkle tree of size 4. The root of this tree is the operations hash present in the block header. The join-hash walk from the fourth leaf to the root is added to the high proof. The derivation of the high proof is now the network-operations hash.

From Operation(s) Hash To Block Hash

The block header is extracted and converted into Micheline binary, again ignoring fields such as its own hash that were not present in the original binary. The original header contains both the timestamp and the network-operations hash. The block hash is the BLAKE2b 32-byte digest of the Micheline binary of the header; so the bytes before and after the operations hash are embedded in a Join proof-operation, followed by a BLAKE2b proof-operation, which are concatenated to the high proof. The derivation of the high proof is now the block hash.

The server tries to verify the high proof. If it cannot, a change must have occoured in the protocol and the server will stop taking input.

Constructing The Low Proofs

A proof for each leaf in the aggregator Merkle tree is constructed, their respective inputs being their own bytes and the derivation being the aggregator root. We shall call these the "low proofs". Each low proof is concatenated with the high proof to create a full proof taking the file hash leaf as input and deriving to the block hash. The server stores these proofs for later retrieval.