Skip to content

Logging & monitoring

Liz Krznarich edited this page Nov 2, 2023 · 2 revisions

Datadog log/metric aggregation

  • Load balancer and container logs for ROR API and Generate ID apps are forwarded from AWS to Datadog log aggregation service via the Datadog Forwarder Lambda Function
  • Metrics for all services all that have metrics configured are forwarded to Datadog via the Datadog AWS integration (note: despite the docs saying that this integration also forwards logs, it doesn't)
  • We use Datadog's EU service, accessible at https://datadoghq.eu
  • Each user should have their own Datadog account - contact tech lead to get an account

Cloudwatch logs

  • For most other services, logs are available in AWS Cloudwatch
  • Historically, log configuration for new infrastructure was spotty, so some services don't have logging configured

Monitoring

  • API and Reconciler uptime are monitored with Pingdom
  • Public status page is (https://aws.amazon.com/pm/cloudwatch
  • Pingdom downtime alerts are sent to Slack #status channel
  • Elasticsearch CPU is monitored via Datadog. An alert is triggered in Slack #status when CPU usage reaches >80% for >5min
Clone this wiki locally