You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add alerts, integrated into Slack and OpsGenie which trigger when the ingest rate slows down and the provider lag grows. We already have an alert for ingest rate stopping for more than an hour which is not catching the gap in ingest issues.
We should look at existing alternative leading indicators to alert on this. Namely:
Probelab providers, which check lookup success for CIDs published within 5 minutes of their publication
Lag value reported for providers at /provider backed. In both recent incidents NFT.Storage lag on /provider backends consistently grew. The lag for this particular provider should typically remain below 20.
The text was updated successfully, but these errors were encountered:
Added additional alerts from metrics collected by the telemetry service. Problab data probably does not apply anymore.
Telemetry service can poll the head advertisement from NFT storage, get some multihashes from that, and then lookup those multihashes. An alert can be generated if the multihashes cannot be looked up after some amount of time. Alternatively, the NFT storage provider distance can be tracked, and an alert generated if the distance grows too large.
Add alerts, integrated into Slack and OpsGenie which trigger when the ingest rate slows down and the provider lag grows. We already have an alert for ingest rate stopping for more than an hour which is not catching the gap in ingest issues.
We should look at existing alternative leading indicators to alert on this. Namely:
The text was updated successfully, but these errors were encountered: