-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(splitstore): merry christmas lotus! Remove ~120 G from lotus datastore #12803
base: master
Are you sure you want to change the base?
Conversation
@@ -64,7 +64,7 @@ type LockedRepo interface { | |||
// The supplied context must only be used to initialize the blockstore. | |||
// The implementation should not retain the context for usage throughout | |||
// the lifecycle. | |||
Blockstore(ctx context.Context, domain BlockstoreDomain) (blockstore.Blockstore, error) | |||
Blockstore(ctx context.Context, domain BlockstoreDomain) (blockstore.Blockstore, func() error, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to document what the returned func
is in the comment, I had to go to the bottom of the only function that implements it to find the closer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or type it, type CloserFn func() error
fsr.bsErr = err | ||
return | ||
} | ||
func (fsr *fsLockedRepo) Blockstore(ctx context.Context, domain BlockstoreDomain) (blockstore.Blockstore, func() error, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document the returned func here, or type it
@@ -52,7 +52,7 @@ var dealLabelCmd = &cli.Command{ | |||
|
|||
defer lkrepo.Close() //nolint:errcheck | |||
|
|||
bs, err := lkrepo.Blockstore(ctx, repo.UniversalBlockstore) | |||
bs, _, err := lkrepo.Blockstore(ctx, repo.UniversalBlockstore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be using the closer in all of these?
return bs, nil | ||
} | ||
|
||
func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, error) { | ||
return func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, error) { | ||
func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, func() error, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, func() error, error) { | |
func SplitBlockstore(cfg *config.Chainstore) func(lc fx.Lifecycle, r repo.LockedRepo, ds dtypes.MetadataDS, cold dtypes.ColdBlockstore, hot dtypes.HotBlockstore) (dtypes.SplitBlockstore, error) { |
I think this is both unnecessary and could cause a problem. It ends up passing func(...) (dtypes.SplitBlockstore, func() error, error)
to fx.Provide
which takes all arguments except the last (if it's an error
) to be types to feed into the DI system to be available to match to anything that wants it. So we end up giving it both a dtypes.SplitBlockstore
and a (noop) func() error
. I'm not sure if fx is going to use it if it's not got a named type, but it's in the DI system at least and perhaps someone ends up adding some code that requires a func() error
and magically gets this noop one and then can't figure out why it's not doing what they expect.
Well, it's already been quite an education trying to get my head around what this is doing and why it would work. Some things I've discovered that I didn't really understand previously:
So I think this should be perfectly fine to do. But, I wonder if it might be good to ship this along with a super-simple lotus-shed that you can run on a closed repo that would manually copy everything from the existing cold store into the hot store. Then you can take advantage of this without a resync. If that's too tricky then it's probably not worth it but it might be neat to ship this with that option. |
@@ -9,6 +9,7 @@ | |||
|
|||
# UNRELEASED | |||
|
|||
- Sync snapshots directly to hotstore. Save 120 G of lotus disk space. ([filecoin-project/lotus#12800](https://github.com/filecoin-project/lotus/pull/12803)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my comments, "120 G" is overselling it by quite a bit, we need to nuance this and probably not even put a number on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically correct @rvagg but this is not really the christmas spirit. 🎅 😢
I really don't buy that this is significantly overselling it. Its just the difference in startup sizes immediately after syncing from snapshot and a lot of us do that pretty often. Plus it looks like its also exactly the high water mark which matters because that's the value that matters when you need to get bigger disks.
I guess we should have a graph of hotstore size over time to quantify it. After 8 months your hot store has leaned down as the state has replaced itself but its a slow process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it more I don't understand how you're seeing the numbers you do and the original claim seems to be incorrect. See latest comment in main thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you just missed the full copy of all cold objects to hot in the warmup procedure: https://github.com/filecoin-project/lotus/blob/master/blockstore/splitstore/splitstore_warmup.go#L89-L109
Another question I have about this is that might be important: is it going to increase CPU usage for hotstore compaction? Do we take shortcuts if we find that a CID is in the coldstore and essentially skip over it, while being more rigorous about hotstore objects? I don't think so but it's not crystal clear to me that this is the case. |
@rvagg your numbers aren't squaring up with my understanding of the system. Snapshot size is 1 state + 2.2 finalities of churn. After compaction we should have 1 state + 4 finalities of churn. So the hotstore should never be below snapshot size aka 120 G. In discussion above there is an assumption that data in the universal store is "useful" but I don't think that's ever the case. The discard store is backed by the universal store but 1) its read only and nothing is ever flushed to it 2) everything in the universal store is copied to the hotstore upon warmup. So the only time the universal store uniquely persists a block it is after that block has been discarded from hotstore and is definitionally not useful. My claims above about the churn being slowly replaced explaining your datastore size are incorrect. The latest 4 finalities of state should always be kept in the hotstore. Since you're seeing 60 G splitstore dir clearly I'm off about something here, do you see what I'm missing? |
Related Issues
#10554
Proposed Changes
As part of the christmas spirit I wanted to do something that everyone in the network would see as a gift. This year I decided to get us the gift of 100 fewer gigs in the lotus datastore
Basically today we hold onto two copies of the state. One original in the cold store and one hot copy. As far as I can tell there is zero reason to do this.
This change puts the snapshot directly into the hotstore since most of us are syncing from snapshots now.
Additional Info
As far as I can tell the original add snapshot to cold then warmup hot was meant to fit the broader pattern of the splitstore working with existing large datastores. However in today's environment the default and most commonly used pattern of operation is to sync from a snapshot and run in discard mode.
Somebody tell me if this breaks something? I'm running pretty good on mainnet right now.
Note: in the first version I removed the warmup procedure in favor of a config for setting full warmup. At this point I think that is the wrong move. We would pay for additional complexity of one config entry to skip warmup work at startup. This work is cheap enough to run once and by including it we automatically handle all upgrade paths to using splitstore without any user complexity. And in the common case we'll walk the chain and do
has
checks, no blockstore puts required which will be much cheaper than the current warmup.Checklist
Before you mark the PR ready for review, please make sure that: