Create new engineering internal monitoring & dashboard #364

Robcwilliams · 2018-01-05T14:43:23Z

Engineering would like to add additional layers of internal monitoring for awareness and early detection

Monitor for if RDS goes down
Monitor for if there is a widespread amazon outage in a location where we are hosted
https://status.aws.amazon.com/
Identify exec level items that need to happen or some playbook needs to be ran prior to escalating and our engineers getting pinged.
-- These self healing type playbooks should be written / provided by Insights eng
Monitor & report number of 500 errors per night
Monitor & report total uploads processed nightly

Robcwilliams added the 4.0 label Jan 5, 2018

Provide feedback