Logging & monitoring

Jump to bottom

Liz Krznarich edited this page Nov 2, 2023 · 2 revisions

Datadog log/metric aggregation

Load balancer and container logs for ROR API and Generate ID apps are forwarded from AWS to Datadog log aggregation service via the Datadog Forwarder Lambda Function
Metrics for all services all that have metrics configured are forwarded to Datadog via the Datadog AWS integration (note: despite the docs saying that this integration also forwards logs, it doesn't)
We use Datadog's EU service, accessible at https://datadoghq.eu
Each user should have their own Datadog account - contact tech lead to get an account

Cloudwatch logs

For most other services, logs are available in AWS Cloudwatch
Historically, log configuration for new infrastructure was spotty, so some services don't have logging configured

Monitoring

API and Reconciler uptime are monitored with Pingdom
Public status page is (https://aws.amazon.com/pm/cloudwatch
Pingdom downtime alerts are sent to Slack #status channel
Elasticsearch CPU is monitored via Datadog. An alert is triggered in Slack #status when CPU usage reaches >80% for >5min