-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate and compare CRC when writing and reading ledger snapshots #1319
Conversation
41891a0
to
e449818
Compare
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Util/CBOR.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
b001c90
to
a599027
Compare
ouroboros-consensus/changelog.d/20241128_084625_georgy.lukyanov_892_checksum_snaphot_file.md
Outdated
Show resolved
Hide resolved
ouroboros-consensus/changelog.d/20241128_084625_georgy.lukyanov_892_checksum_snaphot_file.md
Outdated
Show resolved
Hide resolved
...s-consensus-cardano/changelog.d/20241127_163624_georgy.lukyanov_892_checksum_snaphot_file.md
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Show resolved
Hide resolved
ouroboros-consensus/changelog.d/20241128_084625_georgy.lukyanov_892_checksum_snaphot_file.md
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
31b892a
to
5d39721
Compare
I've compared the performance of reading and then writing a snapshot using I would like to spend some more time looking at the microbenchmarks that we currently have and maybe adding one for reading-writing ledger state snapshots. Logs of `db-analyser` with timingsFeature 101.325841s
Main 99.685827s
|
e9ae82c
to
4286fdd
Compare
b524f64
to
c8c082e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good in general, but we should propagate this configuration option to the node and expose it on the JSON/YAML config file.
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Util.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Init.hs
Outdated
Show resolved
Hide resolved
...s-consensus-cardano/changelog.d/20241129_122139_georgy.lukyanov_892_checksum_snaphot_file.md
Outdated
Show resolved
Hide resolved
c8c082e
to
584d4d4
Compare
f27ca6d
to
15ec8ee
Compare
1ec6cdc
to
dcf27db
Compare
ouroboros-consensus-cardano/src/unstable-cardano-tools/Cardano/Tools/DBAnalyser/Analysis.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/ChainDB/Impl/LgrDB.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/DiskPolicy.hs
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Init.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Snapshots.hs
Outdated
Show resolved
Hide resolved
5df819b
to
4200b4a
Compare
- Allow skipping snapshot checksum check - Generalise `Test/Ouroboros/Storage/LedgerDB/OnDisk.hs` - Restrict `Ord` instance for `DiskSnapshot` to `dsNumber` - Use the `Ord` instance in `listSnapshots` - Connect snapshot checksum with the node interface: - Add `Flag "DoDiskSnapshotChecksum"` to `DiskPolicyArgs` - Expose `(No)DoDiskSnapshotChecksum` in Ouroboros.Consensus.Node - Re-export `Flag` from `DiskPolicy` - `db-analyser` changes: - use checksums by default as that's what `cardano-node` will do - Allow independently disabling checksum on reading and writing
4200b4a
to
2eef543
Compare
Fixes #892
Integration into
cardano-node
: IntersectMBO/cardano-node#6047. This uses the branch geo2a/issue-892-checksum-snaphot-file-release-ouroboros-consensus-0.21.0.0-backport which is the backport of this PR onto the most resent release of theouroboros-consensus
package.In this PR, we change the reading and writing disk snapshots of ledger state. When a snapshot is taken and written to disk, an additional file with the
.checksum
extension is written alongside it. The checksum file contains a string that represent the CRC32 checksum of the snapshot.The checksum is calculated incrementally, alongside writing the snapshot to disk. When a snapshot is read from dist, the checksum is again calculated and compared to the tracked one. If the checksum is different,
readSnaphot
returns theReadSnapshotDataCorruption
error, indicating data corruption.The checksum is calculated incrementally, alongside reading a writing the data. On write, we use the
hPutAllCRC
function fromfs-sim
, and on read we modify the readIncremental function to compute the checksum as data is read.To enable seamless integration into
cardano-node
, we make the check optional. When initialising the ledger state from a snapshot ininitLedgerDB
, we issue a warning in case the checksum file is missing for a snapshot, but do not fail as in case of invalid snapshots.The
db-analyser
tool ignores the checksum files by default when reading the snapshots. We add--disk-snapshot-checksum
flag to enabled the check. When writing a snapshot to disk, e.g. as a result of the--store-ledger
analysis,db-analyser
will always writecalculate the checksum and write it into the snapshot's
.checksum
file.Tests
There state machine test in
Test.Ouroboros.Storage.LedgerDB.OnDisk
is relevant to this feature, and has caught a number of silly mistakes in the process of its implementation, for example forgetting to delete a checksum file when the snapshot is deleted.The model in the test does not track checksums, and I do not think it can (or should) be augmented to do that. Howerver, the
Snap
andRestore
events are now parameterised by the checksum flag, and the values for the flag are randomised when generating these events. This leads to testing the following properties:Restore
events will always lead to restoring from a snapshot, even ifSnap
events do not write checksum files (i.e. their flag isNoDoDiskSnapshotChecksum
~=False
).DoDiskSnapshotChecksum
flag changes in the code base and becomes strict, i.e. hard fail if the checksum file is missing, this test will discover that.Effects on Performance:
Running
db-analyser
to read a ledger snapshot and store the snapshot of the state at the next slot shows a difference of 2 seconds on my machine. See a comment below for the logs.To precisely evaluate the effects, we need a micro-benchmark of the reading and writing of snapshots with and without the checksum calculation.