-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(store): set optimal badger config to avoid memory spikes #1072
Conversation
store/badger.go
Outdated
gcTimeout = 1 * time.Minute | ||
discardRatio = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you comment a justification for these? link to some relevant forum thread or reasoning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
store/badger.go
Outdated
ctx := context.Background() | ||
for { | ||
select { | ||
case <-ctx.Done(): | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx is useless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
case <-gcTimeout.C: | ||
err := b.db.RunValueLogGC(discardRatio) | ||
if err != nil { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log added
94ae217
to
219f2e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good just not sure about this conditional code based on context
store/badger.go
Outdated
if ctx != context.TODO() { | ||
go b.gc(ctx, gcTimeout, discardRatio, logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit weird
can you comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactl is the problem with running the gc even with context todo?
Cant you just handle ErrGCInMemoryMode at the callsite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it does not make sense. i actually removed the context.
store/badger.go
Outdated
func (b *BadgerKV) gc(period time.Duration, discardRatio float64, logger types.Logger) { | ||
ticker := time.NewTicker(period) | ||
defer ticker.Stop() | ||
for range ticker.C { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add graceful shutdown handling
and probably godoc
case <-db.ctx.Done():
// Exit the function if the database context is cancelled
return
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
channel added as an exit case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the db has exposed ctx.
from the docs:
// Only one GC is allowed at a time. If another value log GC is running, or DB
// has been closed, this would return an ErrRejected.
maybe it's more elegenat to just check for this type of error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if we check for ErrRejected, it maybe the previous didnt finish for some reason, for instance, and we kill the loop while it was not necessary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine
Even though I'm not sure how maybe the previous didnt finish
is possible. does RunValueLogGC
actually runs in the background?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sorry, this shoudlnt happen, but from the docs not sure if ErrRejected will only happen when closing the db...
// Not really necessary if disabling compression | ||
opts.BlockCacheSize = 0 | ||
// compressions reduces storage usage but increases memory consumption, specially during compaction | ||
opts.Compression = options.None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no compression at all?
maybe we need to separate between sequencer/full node/archive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem with dymint store is not the storage space but the memory consumption when compacting. its true its not optimal (the size it can be doubled when using no compression) but taking into account that in an unpruned node the application db space can be more than 10x the dymint storage space, it does not seem to be the issue. also this is a short-term solution to avoid oom issues in nodes. long-term solution will probably be replace badger by a more optimal store for dymint requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closing channel is a bit weird, that's the point of context, which is what was there before
so context would make more sense?
5c404e1
to
70dc4b7
Compare
(cherry picked from commit 5f8b1f7)
(cherry picked from commit 5f8b1f7)
PR Standards
Opening a pull request should be able to meet the following requirements
--
PR naming convention: https://hackmd.io/@nZpxHZ0CT7O5ngTp0TP9mg/HJP_jrm7A
Close #1067
<-- Briefly describe the content of this pull request -->
For Author:
godoc
commentsFor Reviewer:
After reviewer approval: