-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MLPAB-1709 - How to get uptime stats (#986)
* create dash and document how to check them
- Loading branch information
1 parent
a223c87
commit 01d59d5
Showing
3 changed files
with
84 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Checking service uptime | ||
|
||
## Overview | ||
|
||
This runbook describes how to check the uptime of the service, and how to check the uptime of the service's dependencies. | ||
|
||
Health checks are defined in the [adr-007](https://docs.opg.service.justice.gov.uk/documentation/adrs/adr-007.html) ADR. | ||
|
||
We have metrics for the `/health-check/service` endpoint and the `/health-check/dependencies` endpoint. | ||
|
||
Both endpoints are monitored by a Route53 health check that runs every 30 seconds. Checks are configured to send a notification to the team via Slack if the endpoint is down for key environments like Production. | ||
|
||
The [Route53 Health checks](https://us-east-1.console.aws.amazon.com/route53/healthchecks/home?region=us-east-1#/) are in the AWS us-east-1 region, and check from locations in the US, EU and Asia. | ||
|
||
## Checking the uptime of the service | ||
|
||
Each environment has a Cloudwatch dashboard that shows the uptime of the service and it's dependencies, named `health-checks-<environment-name>-environment`. | ||
|
||
You can access them here, after logging in and assuming role into the relevant AWS account: | ||
|
||
- [Cloudwatch Dashboards](https://eu-west-1.console.aws.amazon.com/cloudwatch/home?region=eu-west-1#dashboards) |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
resource "aws_cloudwatch_dashboard" "health_checks" { | ||
provider = aws.region | ||
dashboard_body = jsonencode({ | ||
widgets = [ | ||
{ | ||
type = "metric" | ||
x = 0 | ||
y = 0 | ||
width = 12 | ||
height = 6 | ||
|
||
properties = { | ||
sparkline = true, | ||
view = "singleValue", | ||
metrics = [ | ||
["AWS/Route53", "HealthCheckPercentageHealthy", "HealthCheckId", aws_route53_health_check.service_health_check.id, { region = "us-east-1" }] | ||
], | ||
region = "us-east-1", | ||
start = "-PT8640H", | ||
end = "P0D", | ||
period = 300, | ||
title = "service health-check - average uptime of the service over 12 month window" | ||
} | ||
}, | ||
{ | ||
type = "metric" | ||
x = 0 | ||
y = 6 | ||
width = 12 | ||
height = 5 | ||
|
||
properties = { | ||
sparkline = true, | ||
view = "singleValue", | ||
metrics = [ | ||
["AWS/Route53", "HealthCheckPercentageHealthy", "HealthCheckId", aws_route53_health_check.dependency_health_check.id, { region = "us-east-1" }] | ||
], | ||
region = "us-east-1", | ||
start = "-PT8640H", | ||
end = "P0D", | ||
period = 300, | ||
title = "dependency health-check - average availability of service dependencies over 12 month window" | ||
} | ||
} | ||
] | ||
}) | ||
dashboard_name = "health-checks-${data.aws_default_tags.current.tags.environment-name}-environment" | ||
} |