-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: events: sqlite db improvements #12090
Conversation
* fix unclosed multi-row query * tune options to limit wal growth Ref: #12089
All checks have passed |
I think I've cracked it! This line, right at the beginning of initialising the database checks whether we need to make the database or not: lotus/chain/events/filter/index.go Line 440 in 63b9577
In normal operation, we end up at I believe this is the read operation that's left open for the entirety of the lifetime of the database so the checkpoint never gets to run and the WAL just grows and grows. This could also explain performance problems too if we have a mammoth WAL that we need to process when doing queries too. The changes in here force a WAL truncation on checkpoint, and I've lowered the checkpoints to 256 pages. Our page size is unfortunately set quite large for this db (set on db create), so 256 gets us to ~8MiB WAL size before checkpoint and then truncate. The default is 1000, which would land us closer to 32MiB. Watching the chain as I write this, I'm seeing it truncate every 3-4 epochs where it hits around the 8MiB mark. So it's ~2MiB of events per epoch, although it's extremely variable. There'll probably be epochs where we easily blow past the 8MiB mark. So, we could increase this number, but compacting the write-ahead log every epoch or 4 doesn't seem like such a bad thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
* fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: filecoin-project#12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures
* fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: filecoin-project#12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures
* fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: #12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures
Continuing to monitor my wal file and the behaviour is not what I expected: mostly it seems to hover around what I showed above, but it has long stretches where it gets stuck at a particular, overly large, size. A few hours at 42MiB; the next day a stretch of ~5h at 44MiB, down again for 2 epochs then up to 37MiB for another ~6h, up from there to 41MiB for a couple of hours, then back down again; later the same day it grew up to 140MiB for a few hours but now it's running normal again. The most important thing is that it eventually shrinks. However the behaviour of growing and getting stuck at a size well above the set maximum is strange because I can't reconcile it with what the docs suggest and what I know of the access patterns. Like we have long-running read locks that eventually free up after hours; but that doesn't square with how we run this thing. Perhaps this is just internal sqlite behaviour that's not clearly documented. |
* fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: filecoin-project#12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures
* fix: ci: do not use deprecated --debug goreleaser flag (#12086) * chore: deals: remove forgotten graphsync references (#12084) * chore: types: remove more items forgotten after markets (#12095) * chore: cleanup: remove more items forgotten after markets * .gz somehow reappeared after #11625 * fix: ETH RPC API: ETH Call should use the parent state root of the subsequent tipset (#11905) * fix eth call * tests * changes as per review * changes as per review * Update node/impl/full/eth.go Co-authored-by: Rod Vagg <[email protected]> * fix as per review --------- Co-authored-by: Rod Vagg <[email protected]> * Update changelog to RC2 Update changelog to RC2 * Make gen / make docsgen-cli Make gen / make docsgen-cli * chore: api: the Net API/CLI now remains only on daemon The only part of this repository that does lp2p is now lotus-daemon Remove the CommonNet type, used exclusively bu the CLI stack Adjust the rest of struct-memebership to match what went where End result best seen in diff of `documentation/en/api-v0-methods-miner.md` * Update changelog Update changelog * fix: events: sqlite db improvements (#12090) * fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: #12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures * Update CHANGELOG.md --------- Co-authored-by: Piotr Galar <[email protected]> Co-authored-by: Peter Rabbitson <[email protected]> Co-authored-by: Aarsh Shah <[email protected]> Co-authored-by: Rod Vagg <[email protected]> Co-authored-by: Peter Rabbitson <[email protected]>
* release: v1.26.3 (#11908) (#11915) * deps: update dependencies to address migration memory bloat to address memory concerns during a heavy migration Ref: filecoin-project/go-state-types#260 Ref: whyrusleeping/cbor-gen#96 Ref: filecoin-project/go-amt-ipld#90 * release: prep v1.26.3 patch Prep v1.26.3 patch release: - Update changelog, version and make gen + make docsgen-cli * deps: update cbor-gen to tagged version deps: update cbor-gen to tagged version * deps: update go-state-types to tagged version deps: update go-state-types to tagged version v0.13.2 * chore: deps: update go-state-types to v0.13.3 Fixes a panic when we have fewer than 1k proposals. --------- Co-authored-by: Rod Vagg <[email protected]> Co-authored-by: Steven Allen <[email protected]> * build: release: v1.27.0-rc1 (#11947) * chore: Set version as v1.27.0-rc1 Set version as v1.27.0-rc1, run make gen & make docsgen-cli * Update changelog Update changelog * Update changelog Update changelog based on feedback * Bump pubsub-dep Bump pubsub-dep * Prep v1.27.0-rc2 Prep v1.27.0-rc2 * Typo fixes, and more changelog updates Typo fixes, and more changelog updates * chore: remove unmaintained bootstrappers (#11983) * chore: remove unmaintained bootstrappers chore: remove unmaintained bootstrappers * Update mainnet.pi fixing typoed domain fixing typo for 1475.io 'bootstarp' -> 'bootstrap' * Update mainnet.pi apparently the actual hostname is typoed. so bootstarp it is. --------- Co-authored-by: smagdali <[email protected]> * chore: update go-data-transfer and go-graphsync * add ETH addrs API to Gateway (#11979) * fix: copy Flags field from SectorOnChainInfo Fixes: #11962 * feat: libp2p: Lotus stream cleanup (#11993) * set stream deadlines in Lotus * reduce timeout * whitelist bootstrappers * fix tests * Update changelog and version Update changelog and version * ci: deprecate circle ci in favour of github actions (#11786) * Update changelog Update changelog with the deprecate circle-ci * chore: update drand (#12021) * Update changelog / make docsgen Update changelog / make docsgen * chore: lint: update golangci lint config * remove and replace some linters * remove some exclusions * make all exclusions more explicit matches * chore: lint: fix lint errors with new linting config Ref: #11967 * chore: lint: address feedback from reviews * doc: eth: restore comment lost in linter cleanup Ref: #11968 * chore: libp2p: update to v0.34.1 (#12027) * update libp2p to v0.34.0 * fix libp2p err * fix imports * update go mod * update go mod * Update changelog Update changelog * go mod tidy go mod tidy * revert go version change (#12050) * Update changelog Update changelog * chore: backport #12054 to release/v1.27.0 branch (#12056) * chore: pin golanglint-ci to v1.58.2 (#12054) Fixes: #12044 * Add backport to changelog Add backport to changelog --------- Co-authored-by: Rod Vagg <[email protected]> * Bump version - make gen/make docsgen Bump version - make gen/make docsgen * Update changelog Update changelog * Bump NodeBuildVersion to v1.27.1-rc1 Bump NodeBuildVersion to v1.27.1-rc1 * Add Lotus-Miner / Curio related changes Add Lotus-Miner / Curio related changes * Update date and upgrade warnings Update date and upgrade warnings * fix: ci: do not use deprecated --debug goreleaser flag (#12086) * chore: deals: remove forgotten graphsync references (#12084) * chore: types: remove more items forgotten after markets (#12095) * chore: cleanup: remove more items forgotten after markets * .gz somehow reappeared after #11625 * fix: ETH RPC API: ETH Call should use the parent state root of the subsequent tipset (#11905) * fix eth call * tests * changes as per review * changes as per review * Update node/impl/full/eth.go Co-authored-by: Rod Vagg <[email protected]> * fix as per review --------- Co-authored-by: Rod Vagg <[email protected]> * Update changelog to RC2 Update changelog to RC2 * Make gen / make docsgen-cli Make gen / make docsgen-cli * chore: api: the Net API/CLI now remains only on daemon The only part of this repository that does lp2p is now lotus-daemon Remove the CommonNet type, used exclusively bu the CLI stack Adjust the rest of struct-memebership to match what went where End result best seen in diff of `documentation/en/api-v0-methods-miner.md` * Update changelog Update changelog * fix: events: sqlite db improvements (#12090) * fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: #12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures * Update CHANGELOG.md * build: release: v1.27.1-rc2 (#12101) * fix: ci: do not use deprecated --debug goreleaser flag (#12086) * chore: deals: remove forgotten graphsync references (#12084) * chore: types: remove more items forgotten after markets (#12095) * chore: cleanup: remove more items forgotten after markets * .gz somehow reappeared after #11625 * fix: ETH RPC API: ETH Call should use the parent state root of the subsequent tipset (#11905) * fix eth call * tests * changes as per review * changes as per review * Update node/impl/full/eth.go Co-authored-by: Rod Vagg <[email protected]> * fix as per review --------- Co-authored-by: Rod Vagg <[email protected]> * Update changelog to RC2 Update changelog to RC2 * Make gen / make docsgen-cli Make gen / make docsgen-cli * chore: api: the Net API/CLI now remains only on daemon The only part of this repository that does lp2p is now lotus-daemon Remove the CommonNet type, used exclusively bu the CLI stack Adjust the rest of struct-memebership to match what went where End result best seen in diff of `documentation/en/api-v0-methods-miner.md` * Update changelog Update changelog * fix: events: sqlite db improvements (#12090) * fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: #12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures * Update CHANGELOG.md --------- Co-authored-by: Piotr Galar <[email protected]> Co-authored-by: Peter Rabbitson <[email protected]> Co-authored-by: Aarsh Shah <[email protected]> Co-authored-by: Rod Vagg <[email protected]> Co-authored-by: Peter Rabbitson <[email protected]> * small fix in changelog * fix: release: update goreleaser config file Fixes: #12120 * fix go releaser and test with rc3 * Update CHANGELOG.md * lotus v1.27.1 prep * address review - resolve one more conflicts - revert 2 new line added * doc: events: note events db migration impact --------- Co-authored-by: Phi-rjan <[email protected]> Co-authored-by: Rod Vagg <[email protected]> Co-authored-by: Steven Allen <[email protected]> Co-authored-by: smagdali <[email protected]> Co-authored-by: Aarsh Shah <[email protected]> Co-authored-by: Piotr Galar <[email protected]> Co-authored-by: Peter Rabbitson <[email protected]> Co-authored-by: Peter Rabbitson <[email protected]>
Ref: #12089
--
This is a draft for now while I test it out. I've manually truncated my current, massive wal file (sqlite3 /path/to/sqlite/events.db
followed byPRAGMA wal_checkpoint(TRUNCATE);
) so I'm starting from zero again and will watch its behaviour with this and report back.