-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CCIP-2611] Report new heads to atlas' OTI #13647
Conversation
ae4e63b
to
9a1362a
Compare
request := &telem.HeadReportRequest{ | ||
ChainId: head.EVMChainID.String(), | ||
Timestamp: uint64(head.Timestamp.UTC().Unix()), | ||
BlockNumber: uint64(head.Number), | ||
BlockHash: head.Hash.Hex(), | ||
Finalized: head.IsFinalized, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need this to be something like:
type Block struct {
Timestamp uint64
Number uint64
Hash string
}
type HeadReportRequest {
ChainId uint64
Latest Block
Finalized Block
}
Use head.LatestFinalizedHead()
to fetch finalized for a given latest head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. we want block info for both latest
and finalized
(at that latest point in time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
) | ||
|
||
func NewTelemetryReporter(chainContainer legacyevm.LegacyChainContainer, lggr logger.Logger, monitoringEndpointGen telemetry.MonitoringEndpointGenerator) *telemetryReporter { | ||
endpoints := make(map[*big.Int]commontypes.MonitoringEndpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pointer is not a good key for a map, as any new pointer (even with same value) will have a different place in memory; if it is chainID, it's safe to use uint64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
func NewTelemetryReporter(chainContainer legacyevm.LegacyChainContainer, lggr logger.Logger, monitoringEndpointGen telemetry.MonitoringEndpointGenerator) *telemetryReporter { | ||
endpoints := make(map[*big.Int]commontypes.MonitoringEndpoint) | ||
for _, chain := range chainContainer.Slice() { | ||
endpoints[chain.ID()] = monitoringEndpointGen.GenMonitoringEndpoint("EVM", chain.ID().String(), "", synchronization.HeadReport) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we also need some form of check if telemetry is enabled in cfg? What do we see in the plugins? I assume this would fail if not, even though prometheus metrics are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
6ea8d84
to
8912d16
Compare
7c520d0
to
986c510
Compare
@@ -30,6 +30,7 @@ func TestTOMLGeneralConfig_Defaults(t *testing.T) { | |||
assert.False(t, config.StarkNetEnabled()) | |||
assert.Equal(t, false, config.JobPipeline().ExternalInitiatorsEnabled()) | |||
assert.Equal(t, 15*time.Minute, config.WebServer().SessionTimeout().Duration()) | |||
assert.Equal(t, false, config.HeadReport().TelemetryEnabled()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smartcontractkit/foundations do you think this should default to true? Or maybe a HeadReport.TelemetryDisabled
config instead? I see this being enabled for every (prod) node by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default should be true
, as it requires less configuration to get this feature up and running
defer hrd.wgDone.Done() | ||
ctx, cancel := hrd.chStop.NewCtx() | ||
defer cancel() | ||
after := time.After(hrd.reportPeriod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a heads-up: this is fixing a bug that was present in the original code, where periodically reported Prom metrics were inadvertently being set only if it went through (default=) 30s without a new head being detected;
Since the metrics being set periodically are different than the ones on new heads, this means those could go a long time (or even forever) without being updated.
Taking after
out of the for
loop ensures these are set unconditionally every reportPeriod
, as expected
7c261f4
to
045aee7
Compare
045aee7
to
6a5690d
Compare
3412162
to
6a721b8
Compare
Quality Gate passedIssues Measures |
CCIP-2611
Report new heads to atlas' OTI to track NOPs health (block delay)
Requires Dependencies
Add wsRPC endoint in Atlas's OTI: https://github.com/smartcontractkit/atlas/pull/6394/files
Resolves Dependencies
None