Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce disk space for vochain statedb #309

Closed
p4u opened this issue Oct 7, 2021 · 5 comments
Closed

Reduce disk space for vochain statedb #309

p4u opened this issue Oct 7, 2021 · 5 comments
Assignees

Comments

@p4u
Copy link
Member

p4u commented Oct 7, 2021

For a vochain network with the following details:

  • height:4021
  • processes:550
  • votes:4789145

The disk space used is 13G distributed as follows:

  • 3.1G blockstore.db
  • 160M state.db
  • 9.0G vcstate

While the whole list of transactions (stored by Tendermint) is 3.1GiB, the vochain state uses 9GiB. This seems a bit too much since the state should not save the full vote package but only the hash of it. Why is this happening and how can we reduce the amount of disk space required by statedb?

@p4u
Copy link
Member Author

p4u commented Oct 7, 2021

Oh, I see now the state stores the whole set of bytes of every casted vote https://github.com/vocdoni/vocdoni-node/blob/master/vochain/state.go#L576

In the previous version of statedb we were only saving the Hash of the vote, since the vote bytes can be recovered from the Tendermint blockstore.

Is there a reason for this change @ed255 ?

EDIT: Wrong comment, the current code stores only the hash 😅

@p4u
Copy link
Member Author

p4u commented Oct 7, 2021

The current AddVote method stores the following information:

	sdbVote := models.StateDBVote{
		VoteHash:  ethereum.HashRaw(voteBytes),
		ProcessId: vote.ProcessId,
		Nullifier: vote.Nullifier,
	}

This is three hashes of 32 bytes each, so 96 bytes per vote. Taking into account the number of votes, the total raw store data for the votes should be around 438 MiB, which is far away from the 9 GiB currently stored by statedb.

@ed255
Copy link
Contributor

ed255 commented Oct 11, 2021

Here are two sources of storage overhead that I can think of.

Storage of nodes in outdated paths in Arbo

For every Tree.Add / Tree.Set, some of the nodes in the merkle tree are updated. For each node update, Arbo creates a new key-value entry in the DB and doesn't delete the key-value entry corresponding to the updated node. This allows traversing the tree from old Roots, but also means that outdated nodes (nodes that can't be reached with the current tree) are stored in the DB.

So for example, if we have a balanced tree with 1024 leafs (that's 10 levels), and we update each leaf, since we have updated every node now we are taking more than double of the space as before. Here are some rough numbers for this example:

L = Leaf Size
N = Intermediate node Size
# Original tree with 1024 leafs
size = (1024 * 2 - 1) * N + 1024 * L
# Tree after 1024 updates
size = (1024 * 2 - 1) * N + 1024 * L + 10 * 1024 * N + 1024 * L = (12 * 1024 - 1) * N + 2 * 1024 * L 

Notice that leaf update in this tree updates 10 nodes (the leaf path), so we outdate 10 nodes each time.

Solutions

A. Arbo could delete outdated nodes on each operation. I think this one is quite easy to achieve, nevertheless a requirement for this would be that the StateDB uses a single Tx for a block update, and currently we are using more than one Tx in order to avoid the Tx growing too much. BadgerDB has a preconfigured limit on the number of writes in a Tx, whereas Pebble can use as much memory as is available.

Edit1: I have thought about a solution for the fact that StateDB does multiple commits internally within a single block. Simmilarly to the db.Batch type we have, we could have a special type that does commits internally when the Tx becomes too big, but stores the list of deletes into a list that is only applied on db.Batch.Commit(). This way we make sure we don't delete nodes before ending a block, which would be problematic if a crash happens at that point.

B. Figure out a way to only keep nodes accessible from "pinned" snapshots. This is what filesystems that support snapshots do. I don't know how that would work internally. With this, we could have pinned the last 5 blocks in the StateDB for example, and delete all the nodes that can't be reached from those 5 points. This deletion could happen after a block is ended.

C. On node reboot, export and import all the arbo trees. This would apply to the censusManager and StateDB. By exporting an importing the leafs, we discard all the outdated nodes, although this operation will take longer the bigger the trees are, and requires node restart.

Considerations

Currently the StateDB doesn't store information about the subtrees layout. For some of the solutions, having this layout would be required. This could be done by making sure that after opening a subTree, the subTree config is stored in the NoState KVDB of the mainTree, or in the metadata section of the underlying DB. This would allow programatically operating on all the subTrees without needing to specify each subTree. There would be a challenge though, which is that a subTreeConfiguration requires 2 functions, and we can't serialize functions into a DB :S

Key Value DB uses long prefixes in StateDB

The subTrees in the StateDB are stored in a unique Key Value DB for the StateDB using prefixes. The deeper the subtree, the longer the key (in key-value) prefix will be. For example, the VoteTree is under the ProcessTree under the MainTree. So every key-value stored by the VoteTree is prefixed with the ProcessID (which is 32bytes). If the Key-ValueDB stores the key in raw for each entry (instead of storing the key as part of a radix tree for example), this means a space overhead for each KV entry, which can add up. This needs to be investigated to figure out if it's a noticable overhead (for example, write a test where an Arbo tree is filled, one with a raw WriteTx, another one with a PrefixedWriteTx, and then compare the disk usage of both).

@arnaucube
Copy link
Contributor

Update: talked with @ed255 offline, and we agreed to go with option A. Added an issue in arbo to implement this: vocdoni/arbo#25
I understand that is not an 'urgent' thing to have, but once I finish the current anonymous-voting tasks, is the first thing that I will implement.

@p4u
Copy link
Member Author

p4u commented Sep 1, 2023

Finally implemented as part of this PR #1080

@p4u p4u closed this as completed Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants