Skip to content

Commit

Permalink
MLPAB-1979 - add alarms for 4xx and 5xx opensearch metrics (#1130)
Browse files Browse the repository at this point in the history
* add alarms for 4xx and 5xx opensearch metrics

* create sns topic and pagerduty integration for opensearch
  • Loading branch information
andrewpearce-digital authored Mar 18, 2024
1 parent db7bcd6 commit 5f778e1
Show file tree
Hide file tree
Showing 3 changed files with 97 additions and 2 deletions.
9 changes: 9 additions & 0 deletions terraform/environment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ For terraform_environment, this will be based on your PR and can be found in the
| <a name="provider_aws.global"></a> [aws.global](#provider\_aws.global) | 5.41.0 |
| <a name="provider_aws.management_eu_west_1"></a> [aws.management\_eu\_west\_1](#provider\_aws.management\_eu\_west\_1) | 5.41.0 |
| <a name="provider_aws.management_global"></a> [aws.management\_global](#provider\_aws.management\_global) | 5.41.0 |
| <a name="provider_pagerduty"></a> [pagerduty](#provider\_pagerduty) | 3.9.0 |
## Modules
Expand All @@ -143,6 +144,8 @@ For terraform_environment, this will be based on your PR and can be found in the
| [aws_backup_plan.main](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/backup_plan) | resource |
| [aws_backup_selection.main](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/backup_selection) | resource |
| [aws_backup_vault_notifications.aws_backup_failure_events](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/backup_vault_notifications) | resource |
| [aws_cloudwatch_metric_alarm.opensearch_4xx_errors](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_metric_alarm) | resource |
| [aws_cloudwatch_metric_alarm.opensearch_5xx_errors](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_metric_alarm) | resource |
| [aws_dynamodb_table.lpas_table](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dynamodb_table) | resource |
| [aws_dynamodb_table_replica.lpas_table](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dynamodb_table_replica) | resource |
| [aws_opensearchserverless_access_policy.app](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/opensearchserverless_access_policy) | resource |
Expand All @@ -153,9 +156,12 @@ For terraform_environment, this will be based on your PR and can be found in the
| [aws_opensearchserverless_security_policy.lpas_collection_encryption_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/opensearchserverless_security_policy) | resource |
| [aws_opensearchserverless_security_policy.lpas_collection_network_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/opensearchserverless_security_policy) | resource |
| [aws_sns_topic.aws_backup_failure_events](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic) | resource |
| [aws_sns_topic.opensearch](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic) | resource |
| [aws_sns_topic_policy.aws_backup_failure_events](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic_policy) | resource |
| [aws_sns_topic_subscription.opensearch](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sns_topic_subscription) | resource |
| [aws_ssm_parameter.container_version](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ssm_parameter) | resource |
| [aws_ssm_parameter.dns_target_region](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ssm_parameter) | resource |
| [pagerduty_service_integration.opensearch](https://registry.terraform.io/providers/PagerDuty/pagerduty/3.9.0/docs/resources/service_integration) | resource |
| [aws_backup_vault.eu_west_1](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/backup_vault) | data source |
| [aws_backup_vault.eu_west_2](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/backup_vault) | data source |
| [aws_caller_identity.eu_west_1](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
Expand All @@ -169,7 +175,10 @@ For terraform_environment, this will be based on your PR and can be found in the
| [aws_kms_alias.dynamodb_encryption_key_eu_west_2](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/kms_alias) | data source |
| [aws_kms_alias.opensearch](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/kms_alias) | data source |
| [aws_kms_alias.sns_encryption_key_eu_west_1](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/kms_alias) | data source |
| [aws_kms_alias.sns_kms_key_alias](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/kms_alias) | data source |
| [aws_vpc_endpoint.opensearch](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/vpc_endpoint) | data source |
| [pagerduty_service.main](https://registry.terraform.io/providers/PagerDuty/pagerduty/3.9.0/docs/data-sources/service) | data source |
| [pagerduty_vendor.cloudwatch](https://registry.terraform.io/providers/PagerDuty/pagerduty/3.9.0/docs/data-sources/vendor) | data source |
## Inputs
Expand Down
86 changes: 86 additions & 0 deletions terraform/environment/opensearch.tf
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,89 @@ resource "aws_opensearchserverless_access_policy" "team_breakglas_access" {
])
provider = aws.eu_west_1
}

resource "aws_cloudwatch_metric_alarm" "opensearch_4xx_errors" {
alarm_name = "${local.environment_name}-opensearch-4xx-errors"
alarm_actions = [aws_sns_topic.opensearch.arn]
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
metric_name = "4xx"
namespace = "AWS/AOSS"
period = "30"
statistic = "Maximum"
threshold = "1"
alarm_description = "This metric monitors AWS OpenSearch Service 4xx error count for ${local.environment_name}"
insufficient_data_actions = []
dimensions = {
CollectionId = aws_opensearchserverless_collection.lpas_collection.id
CollectionName = aws_opensearchserverless_collection.lpas_collection.name
}
provider = aws.eu_west_1
}

resource "aws_cloudwatch_metric_alarm" "opensearch_5xx_errors" {
alarm_name = "${local.environment_name}-opensearch-5xx-errors"
alarm_actions = [aws_sns_topic.opensearch.arn]
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
metric_name = "5xx"
namespace = "AWS/AOSS"
period = "30"
statistic = "Maximum"
threshold = "1"
alarm_description = "This metric monitors AWS OpenSearch Service 5xx error count for ${local.environment_name}"
insufficient_data_actions = []
dimensions = {
CollectionId = aws_opensearchserverless_collection.lpas_collection.id
CollectionName = aws_opensearchserverless_collection.lpas_collection.name
}
provider = aws.eu_west_1
}

data "pagerduty_vendor" "cloudwatch" {
name = "Cloudwatch"
}

data "pagerduty_service" "main" {
name = local.environment.pagerduty_service_name
}

data "aws_kms_alias" "sns_kms_key_alias" {
name = "alias/${local.default_tags.application}_sns_secret_encryption_key"
provider = aws.eu_west_1
}

resource "aws_sns_topic" "opensearch" {
name = "${local.environment_name}-opensearch-alarms"
kms_master_key_id = data.aws_kms_alias.sns_kms_key_alias.target_key_id
application_failure_feedback_role_arn = data.aws_iam_role.sns_failure_feedback.arn
application_success_feedback_role_arn = data.aws_iam_role.sns_success_feedback.arn
application_success_feedback_sample_rate = 100
firehose_failure_feedback_role_arn = data.aws_iam_role.sns_failure_feedback.arn
firehose_success_feedback_role_arn = data.aws_iam_role.sns_success_feedback.arn
firehose_success_feedback_sample_rate = 100
http_failure_feedback_role_arn = data.aws_iam_role.sns_failure_feedback.arn
http_success_feedback_role_arn = data.aws_iam_role.sns_success_feedback.arn
http_success_feedback_sample_rate = 100
lambda_failure_feedback_role_arn = data.aws_iam_role.sns_failure_feedback.arn
lambda_success_feedback_role_arn = data.aws_iam_role.sns_success_feedback.arn
lambda_success_feedback_sample_rate = 100
sqs_failure_feedback_role_arn = data.aws_iam_role.sns_failure_feedback.arn
sqs_success_feedback_role_arn = data.aws_iam_role.sns_success_feedback.arn
sqs_success_feedback_sample_rate = 100
provider = aws.eu_west_1
}

resource "pagerduty_service_integration" "opensearch" {
name = "Modernising LPA ${local.environment_name} OpenSearch Alarm"
service = data.pagerduty_service.main.id
vendor = data.pagerduty_vendor.cloudwatch.id
}

resource "aws_sns_topic_subscription" "opensearch" {
topic_arn = aws_sns_topic.opensearch.arn
protocol = "https"
endpoint_auto_confirms = true
endpoint = "https://events.pagerduty.com/integration/${pagerduty_service_integration.opensearch.integration_key}/enqueue"
provider = aws.eu_west_1
}
4 changes: 2 additions & 2 deletions terraform/environment/region/pagerduty.tf
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ resource "aws_sns_topic_subscription" "cloudwatch_application_insights" {
topic_arn = data.aws_sns_topic.cloudwatch_application_insights.arn
protocol = "https"
endpoint_auto_confirms = true
endpoint = "https://events.pagerduty.com/integration/${pagerduty_service_integration.ecs_autoscaling_alarms.integration_key}/enqueue"
endpoint = "https://events.pagerduty.com/integration/${pagerduty_service_integration.cloudwatch_application_insights[0].integration_key}/enqueue"
provider = aws.region
}

Expand Down Expand Up @@ -72,6 +72,6 @@ resource "aws_sns_topic_subscription" "event_alarms" {
topic_arn = aws_sns_topic.event_alarms.arn
protocol = "https"
endpoint_auto_confirms = true
endpoint = "https://events.pagerduty.com/integration/${pagerduty_service_integration.ecs_autoscaling_alarms.integration_key}/enqueue"
endpoint = "https://events.pagerduty.com/integration/${pagerduty_service_integration.event_alarms.integration_key}/enqueue"
provider = aws.region
}

0 comments on commit 5f778e1

Please sign in to comment.