fix(splitstore): merry christmas lotus! Remove ~120 G from lotus datastore #12803

ZenGround0 · 2024-12-25T19:43:14Z

Related Issues

Proposed Changes

As part of the christmas spirit I wanted to do something that everyone in the network would see as a gift. This year I decided to get us the gift of 100 fewer gigs in the lotus datastore

Basically today we hold onto two copies of the state. One original in the cold store and one hot copy. As far as I can tell there is zero reason to do this.

This change puts the snapshot directly into the hotstore since most of us are syncing from snapshots now.

Additional Info

As far as I can tell the original add snapshot to cold then warmup hot was meant to fit the broader pattern of the splitstore working with existing large datastores. However in today's environment the default and most commonly used pattern of operation is to sync from a snapshot and run in discard mode.

Somebody tell me if this breaks something? I'm running pretty good on mainnet right now.

Note: in the first version I removed the warmup procedure in favor of a config for setting full warmup. At this point I think that is the wrong move. We would pay for additional complexity of one config entry to skip warmup work at startup. This work is cheap enough to run once and by including it we automatically handle all upgrade paths to using splitstore without any user complexity. And in the common case we'll walk the chain and do has checks, no blockstore puts required which will be much cheaper than the current warmup.

Checklist

Before you mark the PR ready for review, please make sure that:

Commits have a clear commit message.
PR title conforms with contribution conventions
Update CHANGELOG.md or signal that this change does not need it per contribution conventions
New features have usage guidelines and / or documentation updates in
- Lotus Documentation
- Discussion Tutorials
Tests exist for new functionality or change in behavior
CI is green

blockstore/splitstore/splitstore_warmup.go

node/repo/fsrepo.go

…disabled

rvagg · 2025-01-13T05:56:58Z

node/repo/interface.go

@@ -64,7 +64,7 @@ type LockedRepo interface {
 	// The supplied context must only be used to initialize the blockstore.
 	// The implementation should not retain the context for usage throughout
 	// the lifecycle.
-	Blockstore(ctx context.Context, domain BlockstoreDomain) (blockstore.Blockstore, error)
+	Blockstore(ctx context.Context, domain BlockstoreDomain) (blockstore.Blockstore, func() error, error)


would be good to document what the returned func is in the comment, I had to go to the bottom of the only function that implements it to find the closer

or type it, type CloserFn func() error

rvagg · 2025-01-13T05:57:36Z

node/repo/fsrepo.go

-			fsr.bsErr = err
-			return
-		}
+func (fsr *fsLockedRepo) Blockstore(ctx context.Context, domain BlockstoreDomain) (blockstore.Blockstore, func() error, error) {


document the returned func here, or type it

cmd/lotus-shed/splitstore.go

rvagg · 2025-01-13T06:02:23Z

cmd/lotus-shed/deal-label.go

@@ -52,7 +52,7 @@ var dealLabelCmd = &cli.Command{

 		defer lkrepo.Close() //nolint:errcheck

-		bs, err := lkrepo.Blockstore(ctx, repo.UniversalBlockstore)
+		bs, _, err := lkrepo.Blockstore(ctx, repo.UniversalBlockstore)


should we be using the closer in all of these?

rvagg · 2025-01-13T06:10:56Z

node/modules/blockstore.go

 	return bs, nil
 }

-func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, error) {
-	return func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, error) {
+func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, func() error, error) {


Suggested change

func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, func() error, error) {

func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, error) {

I think this is both unnecessary and could cause a problem. It ends up passing func(...) (dtypes.SplitBlockstore, func() error, error) to fx.Provide which takes all arguments except the last (if it's an error) to be types to feed into the DI system to be available to match to anything that wants it. So we end up giving it both a dtypes.SplitBlockstore and a (noop) func() error. I'm not sure if fx is going to use it if it's not got a named type, but it's in the DI system at least and perhaps someone ends up adding some code that requires a func() error and magically gets this noop one and then can't figure out why it's not doing what they expect.

rvagg · 2025-01-13T07:06:43Z

Well, it's already been quite an education trying to get my head around what this is doing and why it would work. Some things I've discovered that I didn't really understand previously:

./datastore/chain is the 120G that we're "saving"
We're likely saving a fair bit less than that because: (a) we still need all the chain, regardless of where it's stored, and (b) it's only the churned state that would have been GCd that we'd be saving, there's a lot of static state that either never changes or won't change for a long time that we won't be saving—but the savings would add up over time for a splitstore node.
- My chain is 120G but my splitstore is 62G. If we consider that a typical snapshot gives me approximately what splitstore's compaction gives me in terms of historical state, then I'm only wasting somewhere below 62G (chain growth is included in there so it is less), because a lot of that 120G is still useful - the chain, and static state. And I've been running this splitstore node since May.
This original behaviour looks to me like it comes from the original work being split between these two PRs: segregate chain and state blockstores #5695 & hot/cold blockstore segregation (aka. splitstore) #4992 - the former introducing the "split" blockstore that was intended to be able to have multiple backing stores for different purposes (a pipe dream ever-noticeable through Lotus code), this PR handled the Import case. Then the second PR introduced the hot store with GC, but it didn't touch the import case.
One of my main questions was whether this would impact the chain history that we need to have available - does moving it out of the universal store and into the hot store make it a candidate for GC? The answer is no: SplitStore#compact() is quite clear that it walks the chain, protects the block headers and even the tipsetkeys, but it only walks 4 finalities of stateroots (I wish this number were configurable, that'd be so much more helpful than the other splitstore config values).
SplitStore almost always will look in hot first and then look in cold, so it's this second read where we find most of the historic stuff that got imported when you started.

So I think this should be perfectly fine to do.

But, I wonder if it might be good to ship this along with a super-simple lotus-shed that you can run on a closed repo that would manually copy everything from the existing cold store into the hot store. Then you can take advantage of this without a resync. If that's too tricky then it's probably not worth it but it might be neat to ship this with that option.

rvagg · 2025-01-13T07:07:49Z

CHANGELOG.md

@@ -9,6 +9,7 @@

 # UNRELEASED

+- Sync snapshots directly to hotstore.  Save 120 G of lotus disk space. ([filecoin-project/lotus#12800](https://github.com/filecoin-project/lotus/pull/12803))


As per my comments, "120 G" is overselling it by quite a bit, we need to nuance this and probably not even put a number on it.

Technically correct @rvagg but this is not really the christmas spirit. 🎅 😢

I really don't buy that this is significantly overselling it. Its just the difference in startup sizes immediately after syncing from snapshot and a lot of us do that pretty often. Plus it looks like its also exactly the high water mark which matters because that's the value that matters when you need to get bigger disks.

~~I guess we should have a graph of hotstore size over time to quantify it. After 8 months your hot store has leaned down as the state has replaced itself but its a slow process.~~

Thinking about it more I don't understand how you're seeing the numbers you do and the original claim seems to be incorrect. See latest comment in main thread.

I think you just missed the full copy of all cold objects to hot in the warmup procedure: https://github.com/filecoin-project/lotus/blob/master/blockstore/splitstore/splitstore_warmup.go#L89-L109

rvagg · 2025-01-13T07:21:47Z

Another question I have about this is that might be important: is it going to increase CPU usage for hotstore compaction? Do we take shortcuts if we find that a CID is in the coldstore and essentially skip over it, while being more rigorous about hotstore objects? I don't think so but it's not crystal clear to me that this is the case.

ZenGround0 · 2025-01-13T15:57:41Z

@rvagg your numbers aren't squaring up with my understanding of the system. Snapshot size is 1 state + 2.2 finalities of churn. After compaction we should have 1 state + 4 finalities of churn. So the hotstore should never be below snapshot size aka 120 G. In discussion above there is an assumption that data in the universal store is "useful" but I don't think that's ever the case. The discard store is backed by the universal store but 1) its read only and nothing is ever flushed to it 2) everything in the universal store is copied to the hotstore upon warmup. So the only time the universal store uniquely persists a block it is after that block has been discarded from hotstore and is definitionally not useful.

My claims above about the churn being slowly replaced explaining your datastore size are incorrect. The latest 4 finalities of state should always be kept in the hotstore.

Since you're seeing 60 G splitstore dir clearly I'm off about something here, do you see what I'm missing?

ZenGround0 added 3 commits December 25, 2024 21:18

splitstore keys info command

f11a49a

load snapshot to hotstore and skip warmup

f593bb1

Close hotstore after writing snapshot

cc0acca

ZenGround0 requested review from Stebalien and magik6k December 25, 2024 19:43

ZenGround0 commented Dec 25, 2024

View reviewed changes

blockstore/splitstore/splitstore_warmup.go Outdated Show resolved Hide resolved

ZenGround0 commented Dec 25, 2024

View reviewed changes

node/repo/fsrepo.go Show resolved Hide resolved

ZenGround0 marked this pull request as ready for review December 25, 2024 20:01

ZenGround0 requested a review from rvagg December 25, 2024 20:01

ZenGround0 added 18 commits December 26, 2024 10:42

Allow configuration of old style warmup routine

a4ac610

Cleanup hot blockstore domain opening

4c002ef

docsgen cli

de81964

Fix: we should be able to sync snapshot to universal when splitstore …

c508613

…disabled

Debug log

466750e

Testing repo must now accept hot blockstore domain

e83a27a

hmm go mod troubles

271bb6c

fix go mod trouble

a75f862

changelog entry

0ec6a17

fixup go mod

6c8cc58

Fixup linting

8613cf8

Fix hang with crazy map allocation

bb6d93a

revert no-warmup

a824321

make gen

aa1584c

make docsgen-cli

7349047

hotstore now memrepo and somehow it works for compaction!

0640edd

lint

155d145

Properly close repo after snapshot import

1d01e1c

BigLep assigned ZenGround0 Jan 6, 2025

rvagg reviewed Jan 13, 2025

View reviewed changes

cmd/lotus-shed/splitstore.go Show resolved Hide resolved

rvagg reviewed Jan 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(splitstore): merry christmas lotus! Remove ~120 G from lotus datastore #12803

fix(splitstore): merry christmas lotus! Remove ~120 G from lotus datastore #12803

ZenGround0 commented Dec 25, 2024 •

edited

Loading

rvagg Jan 13, 2025

rvagg Jan 13, 2025

rvagg Jan 13, 2025

rvagg Jan 13, 2025

rvagg Jan 13, 2025

rvagg commented Jan 13, 2025

rvagg Jan 13, 2025

ZenGround0 Jan 13, 2025 •

edited

Loading

ZenGround0 Jan 13, 2025

ZenGround0 Jan 13, 2025

rvagg commented Jan 13, 2025

ZenGround0 commented Jan 13, 2025 •

edited

Loading

	func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, func() error, error) {
	func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, error) {

		@@ -9,6 +9,7 @@

		# UNRELEASED

		- Sync snapshots directly to hotstore. Save 120 G of lotus disk space. ([filecoin-project/lotus#12800](https://github.com/filecoin-project/lotus/pull/12803))

fix(splitstore): merry christmas lotus! Remove ~120 G from lotus datastore #12803

Are you sure you want to change the base?

fix(splitstore): merry christmas lotus! Remove ~120 G from lotus datastore #12803

Conversation

ZenGround0 commented Dec 25, 2024 • edited Loading

Related Issues

Proposed Changes

Additional Info

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvagg commented Jan 13, 2025

Choose a reason for hiding this comment

ZenGround0 Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvagg commented Jan 13, 2025

ZenGround0 commented Jan 13, 2025 • edited Loading

ZenGround0 commented Dec 25, 2024 •

edited

Loading

ZenGround0 Jan 13, 2025 •

edited

Loading

ZenGround0 commented Jan 13, 2025 •

edited

Loading