-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
size of table files feature request #219
Comments
if we are planning to modify the db parameters it also makes sense to add a flag to enable/disable db compression, for now, for test purposes. It may be more efficient to disable compression on the DB level and use hardware compression on the disk level. Also, it can be useful for backup purposes to implement incremental backups. |
feels like some disclaimer in README about not to be used with --archive could be reasonable addition. been now 3 weeks syncing ksm and getting constantly stuck on ZFS(5x2TB striped pool samsung980) / 128GB DDR5 ECC / 7950x setup inside lxcontainer ksm paritydb archive at blockheight 15067273
yeah not sure how I am supposed to snapshot/backup this so will just resync using rocksdb for the dot/ksm archive nodes and leave westend with paritydb for now |
Why not use a proper backup protocol, such as rsync for example? It solves both issues.
Does anyone really provides snapshots as uncompressed raw dir dumps? I've just googled for polkadot snapshots and the first 3 result were either lz4 or tar archives. |
Rsync does not work natively with cloud storage providers. We use rclone
Yes we do. See https://github.com/paritytech/helm-charts/tree/main/charts/node#public-snapshots |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/optimal-filesystem-for-blockchain-databases/6116/1 |
We are discoursing big archive DBs. Any compression algorithm only saves a couple of percentages of size in our case. But If you use one file it is impossible to sync it fast in a multi-thread way or use diff sync (update only changed db files), also it makes it difficult to continue downloading after an interruption. In general, it will be really difficult to upload a 2-3 terabyte archive file to cloud storage. Also, CDN doesn't allow caching of big files. We can use an additional layer of archiving to split big paritydb files into smaller files. However, it requires more resources on the client side to unpack it (CPU, extra disk space). |
Needs
Syncing archive nodes from scratch takes a lot of time. Also, we sometimes have sync issues like the incoming network slots issue before or the paritydb issue now. We already have the
--sync warp
feature, but it can't be used for archive nodes and it doesn't help to resolve all sync issues (like network issues, disaster recovery when a network doesn't work, etc.).At the same time, networks need archive nodes. It means that users need the possibility to spin up their nodes in a reasonable time. It's why binary db snapshots are still very important for now.
How it works now
rocksdb uses a lot of files that have small sizes but paritydb uses smaller amounts of really big files. For the archive paritydb nodes, they can be table files that are more than 160-200 GB.
Big files lead to some issues:
These things lead to the use of some additional backup systems that can split files into chunks, have their own DB of chunks, and sync them based on checksum. It takes a lot of additional CPU time. Also, it makes using snapshots more difficult for users.
The request
Is it possible to add a new CLI flag that allows restricting the size of table paritydb files for newly synchronized nodes?
The text was updated successfully, but these errors were encountered: