-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(exporter): memory leak and lock contentions #2034
base: stage
Are you sure you want to change the base?
Conversation
# Conflicts: # message/validation/consensus_validation.go # message/validation/partial_validation.go
# Conflicts: # message/validation/common_checks.go # message/validation/consensus_validation.go # message/validation/const.go # message/validation/validation.go
operator/validator/controller.go
Outdated
c.committeesObserversMutex.RLock() | ||
|
||
// Check if the observer already exists | ||
existingObserver, ok := c.committeesObservers.Get(msgID) | ||
if ok { | ||
c.committeesObserversMutex.RUnlock() | ||
return existingObserver | ||
} | ||
|
||
c.committeesObserversMutex.RUnlock() | ||
|
||
c.committeesObserversMutex.Lock() | ||
defer c.committeesObserversMutex.Unlock() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it'a potential race condition because another goroutine might also see non-existance in-between our RUnlock -> Lock and then overwrite us later
if we want to keep this optimization, we could re-check existance after Lock and exit if it already exists
however if you don't think it's a big deal for performance then we can just do this:
c.committeesObserversMutex.RLock() | |
// Check if the observer already exists | |
existingObserver, ok := c.committeesObservers.Get(msgID) | |
if ok { | |
c.committeesObserversMutex.RUnlock() | |
return existingObserver | |
} | |
c.committeesObserversMutex.RUnlock() | |
c.committeesObserversMutex.Lock() | |
defer c.committeesObserversMutex.Unlock() | |
c.committeesObserversMutex.Lock() | |
defer c.committeesObserversMutex.Unlock() | |
// Check if the observer already exists | |
existingObserver, ok := c.committeesObservers.Get(msgID) | |
if ok { | |
return existingObserver | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this optimization is not necessary, I tried it before the last one that worked. I'll revert.
committeesObservers *ttlcache.Cache[spectypes.MessageID, *committeeObserver] | ||
committeesObserversMutex sync.Mutex | ||
// committeeObservers is a cache of initialized committeeObserver instances | ||
committeesObservers *hashmap.Map[spectypes.MessageID, *committeeObserver] // todo: need to evict? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have to, it's an optimization to stay lean by evicting:
- removed/liquidated/exited validators
- proposal duties
if you don't delete them they accumulate over time, however we can try without it and revert if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should eventually evict but I felt ttl cache might not be the best here. maybe we could somehow remove it on actual events in the future.
// nonCommitteeInstanceContainerCapacity returns the capacity of InstanceContainer for non-committee validators | ||
func nonCommitteeInstanceContainerCapacity(fullNode bool) int { | ||
if fullNode { | ||
// Helps full nodes reduce | ||
return 2 | ||
} | ||
return 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nkryuchkov why do we need locks in ValidatorState and OperatorState, or really anything under the top level item (ConsensusState?), if the top level item is already locked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moshe-blox I think you're right, perhaps we could remove them
Codecov ReportAttention: Patch coverage is
Additional details and impacted files☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions
cs := &ValidatorState{ | ||
operators: make([]*OperatorState, len(committee)), | ||
storedSlotCount: MaxStoredSlots(mv.netCfg), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably more readable to have a constructor func for ValidatorState
- NewValidatorState
(for example so we don't forget to initialize operators
apropriately)
// ValidatorState keeps track of the signers for a given public key and role. | ||
type ValidatorState struct { | ||
operators []*OperatorState | ||
storedSlotCount phase0.Slot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While at it - would be nice to add explaining comments about what OperatorState
and ValidatorState
are (eg. what's storedSlotCount
used for and why we have 1 signer per slot in OperatorState
)
@@ -0,0 +1,197 @@ | |||
package validation | |||
|
|||
// message_counts.go contains code for counting and validating messages per validator-slot-round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is probably unnecessary ? Not sure what it's about
@@ -49,27 +50,34 @@ type messageValidator struct { | |||
} | |||
|
|||
// New returns a new MessageValidator with the given network configuration and options. | |||
// It starts a goroutine that cleans up the state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably unnecessary to mention // It starts a goroutine that cleans up the state.
(since it's obvious from the code below) - unless it has some specific ramifications we want to point out
@@ -40,6 +41,8 @@ type CommitteeObserver struct { | |||
syncCommRoots *ttlcache.Cache[phase0.Root, struct{}] | |||
domainCache *DomainCache | |||
// TODO: consider using round-robin container as []map[phase0.ValidatorIndex]*ssv.PartialSigContainer similar to what is used in OperatorState | |||
|
|||
pccMtx *sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There probably isn't a need for pccMtx
to be a pointer here since we access it through *CommitteeObserver
which is itself a pointer
Includes