Replies: 2 comments 6 replies
-
All in all this looks great. A few small comments.
I would argue
Probably tbh. I'm not convinced they need separation.
A problem for another day :)
A slightly related issue is what Gabe just mentioned around warning chains about queued proofs.
I'm not sold on the name, but that's not the end of the world. In its current form per_epoch feels slightly confusing to me as you mean a given specific epoch rather than something occurring over multiple epochs.
I feel we don't want to do this immediately but after n epochs. |
Beta Was this translation helpful? Give feedback.
-
Following up our discussion about evolving the database schema over time. Let me know what do you think of the following design. The idea is to have a storage initializer. The initializer is responsible for taking the storage from any previous version to the latest schema version. It consists of multiple "steps" each pushing the store state to the next version. When we need to change the schema, we add an initialization step, leaving the previous ones intact. The latest executed step is stored in the db, so we know to skip any steps that have already been executed. The initializer would be used roughly as follows: let db = StoreInitializer::open("path/to/db")?
.step(|dbtx| {
// Here we can add CFs, remove CFs, copy data, change data format etc.
// Through the provided handle to a db transaction.
})?
.step(|dbtx| {
// Another step taking us to the next version up
})?
// More steps added here when the data format changes
.finish(); // This gives us the final database handle initialized to the latest schema. Rough implementation outline (making up the method names): struct StoreInitializer { db; DB, cur_step: u32, init_db_step: u32, }
impl StoreInitializer {
fn open(db_path: Path) -> Result<Self> {
let db = DB::open_with_cfs(path, ["StorageMeta"])?;
let init_db_step = db.get("StorageMeta", "InitStep")?.unwrap_or(0);
let cur_step = 0;
Ok(Self {..})
}
fn step(mut self, body: impl FnOnce(&DbTx) -> Result<()>) -> Result<Self> {
self.cur_step += 1;
if self.cur_step > self.init_db_step {
// Everything in a transaction to make sure the migration changes and
// step bump both happen atomically.
let dbtx = self.db.transaction()?;
body(&dbtx)?;
dbtx.put("StorageMeta", "InitStep", self.cur_step)?;
dbtx.commit()?
}
Ok(self)
}
} It's basically equivalent to a series of is statements of the following form but without the need to track the version / step number manually: if init_db_step < 1 {
let dbtx = db.transaction()?;
do_some_stuff_with_dbtx(&dbtx)?;
bump_initialization_step_to(&dbtx, 1);
dbtx.commit();
}
/* ... */
if init_db_step < N {
let dbtx = db.transaction()?;
do_some_other_stuff_with_dbtx(&dbtx)?;
bump_initialization_step_to(&dbtx, N);
dbtx.commit();
} The scheme could be improved / made more robust in a number of ways:
Let me know what you think. |
Beta Was this translation helpful? Give feedback.
-
This discussion is here to expose a first iteration on the Storage design for the AggLayer.
The goal is to define how we store the data and how the data is represented inside a node.
This discussion doesn't take into account some topics such as synchronization, Merkle tree storing performance, versioning and Store API design.
Introduction
To gather and store the data related to the AggLayer and the pessimistic-proof that are being managed, we'll rely on RocksDB.
The goal of this first iteration is to allow the AggLayer to be fault-tolerant regarding crashes, reboot and redeployment.
Another condition was to have such storage as an embedded storage, to not rely on managed or external service.
The access to the data should be as simple as possible and should be quick.
The rest of the document will have this glossary, to which you can refer any time:
ProvenCertificate
is a certificate that has an associated proofDefining the needs
In order to store data with efficiency, we need to both have quick access to important metadata and access to more heavy data if needed. We also need to have the capability to get and check the existence of keys quickly.
We also want to achieve those kinds of actions:
network
and for a particularheight
, being able to fetch thecertificate_id
and the epoch it's settled incertificate_id
, being able to retrieve aCertificateHeader
which consist of:height
,epoch_number
,certificate_index
,proof_index
andnetwork_id
UnprovenCertificate
per network and heightcertificate_id
, being able to fetch a generated proof for acertificate_id
The different logical stores
This discussion doesn't aim at defining in depth those stores, but we need to have some components to check if the storage can be used in a consistent way.
Based on the actions defined above, we can define multiple "store". By
Store
I mean a logical entity that will be responsible for executing and managing resources to offer higher level functions to access the data. However, the store isn't owning the data, he just facilitates the access to it.We can list those store:
State
that contains critical and mandatory data for the AggLayer to workPending
for everything that is related to the pending queuesMetadata
for persistent information related to the node itself and not the data it managesIndex
for data that is not critical but that can facilitate interoperabilityPerEpoch
for information and data related to an EpochAll stores are instantiated once, except for the
PerEpoch
that is instantiated for each epoch.If we take the list of actions, we could assign one action to one or multiple store:
For one
network
and for a particularheight
, being able to fetch thecertificate_id
and the epoch it's settled inKnowing if a
certificate
exists at a particular height for a network is critical in order to accept or deny an incoming certificate. This action could be fully handled by theStateStore
For one
certificate_id
, being able to retrieve aCertificateHeader
which consist of:height
,epoch_number
,certificate_index
,proof_index
andnetwork_id
The
CertificateHeader
isn't critical for the AggLayer itself, as it doesn't really need to use this information. But this can be really useful for an external component to fetch it. TheIndexStore
seems to be the best place for that, the certificate headers could also be moved to a completely different storage mechanism if needed.Being able to get the latest settled epoch number
When the AggLayer reboots or start, it needs to know quickly which Epoch has been settled, the
MetadataStore
seems to be the one for that. (This information can maybe be fetched on L1, but for the sake of simplicity, we'll store it for now)Being able to queue
UnprovenCertificate
per network and heightUnprovenCertificate
are certificate that are verified but not yet proven, they are not part of the state yet nor part of an epoch. This seems to be aPendingStore
candidate.For one
certificate_id
, being able to fetch a generated proof for acertificate_id
This is a particular action which defines that we have both a
certificate
and aproof
that can be settled into an epoch.There are two possible solutions:
- We keep generating
proofs
when receiving certificate, and It's not directly link to an epoch:PendingStore
- We generate
proofs
for certificate only when having an epoch to put them into:PerEpochStore
Being able to store SMT and local_exit_tree
As those trees are important, they fall into
StateStore
The different physical storage
As defined above, the
Store
will not own the data, theStorage
will. I defineStorage
as the physical storage that own the data and that can be persisted. TheStorage
is a combination of one or multipledatabase
(RocksDB instance) and an abstraction layer.This abstraction layer is of two kinds:
DB
layer, which abstract the API/Interface of the RocksDBdatabase
Columns
layer, which define the different CF that are used by the Store to interact with the DBThe
DB
layer will not be covered here as it is purely technical implementation.For the
Columns
it could be interesting to define a first iteration of how those CF are distributed across thedatabase
.Data structures involved
Before diving into the CF definition, let's define what we'll have to store. (This information can change after this discussion, but I will use them for the rest of the doc)
The
database
needs to hold:CFs definition
In this section, I will explain the first design of the CFs that can be used by the stores:
For this section, some key or value are defined using parentheses, it means that the key or value contains multiple "value", but the whole thing is serialized into one single bytes array. Double-quoted values or keys means that this is plain text encoded. Multiple occurrences of
Key -> Value
defines multiple possible key formats inside the same CFcerfificate_per_network
This CF stores for each network the settled
certificate_id
, with the associated epoch and epoch index.certificate_header
This CF stores for each
certificate_id
theCertificateHeader
.latest_settled_certificate_per_network
This CF stores for each
network_id
the latest settledcertificate_id
with theheight
andepoch
information.metadata
This CF stores all the metadata of the AggLayer instance.
For now, I can think of one important one:
latest_settled_epoch => epoch_number
proof_per_certificate
This CF stores for each
certificate_id
the associated proof bytes.pending_queue
This CF stores the pending queue of
UnprovenCertificate
.Currently, the value is an array of certificate bytes, as we could receive multiple concurrent certificates for the same height.
This could be optimized in a next iteration.
local_exit_tree_per_network
This CF stores for each network the local exit tree. This could be replaced by an SMT.
nullifier_tree_per_network
This CF stores for each network the nullifier tree with its associated
root
andleaves
.balance_tree_per_network
This CF stores for each network the balance tree with its associated
root
andleaves
.proofs_per_epoch
This CF stores for each epoch the list of proof generated.
certificates_per_epoch
This CF stores for each epoch the list of certificate settled.
metadata_per_epoch
This CF stores for each epoch the different metadata related to that epoch.
Currently I can think of those metadatas:
epoch_number -> N
tx_hash -> transaction hash settled on L1
number_of_certificates -> N
Assigning CFs to physical database
After defining the CFs we can assign them to different database.
At first we can see that the
certificate bytes array
andproof bytes array
can only be found in thepending_queue
and thecertificates_per_epoch
for certificates andproof_per_certificate
andproofs_per_epoch
for proofs. It means that those type can be in two state:UnprovenCertificate
: Certificate without an associated proof or unassociated to an epochProvenCertificate
: Certificate with an associated proof and associated to an epochUnprovenCertificate
could be placed into adatabase
namepending
, whileProvenCertificate
could be placed into a "per_epoch" database to clusterize the data. Aper_epoch
database will hold everything related to an epoch, after an epoch is closed, this database can be placed in a read-only mode, can be archived and even prune. This could prevent theState
of the agglayer to aggregate too much data in one single database.Indexes
andmetadata
are things that are not really mandatory but really close, we've two choices:database
, one forindexes
and the other formetadata
database
which contains bothIn any case, this
database
will own:metadata
,latest_settled_certificate_per_network
,certificate_header
The next big one is containing everything needed to perform the AggLayer work, we can define a
state
database that contains all of that. The representation would be that:If we assign the CFs to those databases:
Conclusion
This document is way too long, I know. If you have any questions or remark, feel free to address them !
For me, there are still pending questions:
metadata
andindexes
into a singledatabase
?latest_settled_certificate_per_network
belongs in thestate
database or in themetadata|indexes
?Beta Was this translation helpful? Give feedback.
All reactions