Skip to content

Commit

Permalink
MLPAB-1950 - Build index from DynamoDB export (#1279)
Browse files Browse the repository at this point in the history
* enable ingestion

* add permissions for working with exports

* add export config

* use bucket name
turn off stream provessing
use correct index name

* allow uploads

* create policy before pipeline

* update feedback

* update config

* remove prefix

* apply a convention to naming resources

* Restrict which index the pipeline can write to

* add adr for naming and include keys for pipeline

* limit what is put into index

* turn off pipeline and stream processing

* recreate document id as created by the app

* recreate document id as created by the app

* Disable app indexing
Enable stream processing to index

* Update field name

* enable pipeline for export and stream processing

* build without pipeline

* enable pipeline for testing

* add some notes for working with dev mode

* add instructions for reindexing

* add instructions for reindexing

* fix readme index

* add consequences to adr
  • Loading branch information
andrewpearce-digital authored Jul 2, 2024
1 parent c5a472e commit 00c1168
Show file tree
Hide file tree
Showing 10 changed files with 253 additions and 16 deletions.
72 changes: 72 additions & 0 deletions docs/architecture/decisions/0005-namespacing-resources-in-aws.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# 3. Namespacing resources in AWS

Date: 2024-06-19

## Status

Accepted

## Context

This decision is in relation to the name attribute of a resource, not the resource name itself. So for the example below, the name attribute is `event-received-${data.aws_default_tags.current.tags.environment-name}` and the resource name is `event-received`.

```hcl
resource "aws_iam_role" "event_received" {
name = "event-received-${data.aws_default_tags.current.tags.environment-name}"
...
}
```

Making resources in AWS unique is important to avoid conflicts across environments, regions, and accounts. This is especially important when using resources that are shared across multiple environments such as encryption keys.

Granting access to resources is also easier when they are namespaced because it is easier to identify which resources are being accessed.

To make granting access to resources easier, we should use a consistent naming convention for resources.

The values currently used in naming resources are:

- `environment-name`
- `region`
- `resource-name`
- `account-name`
- `application-name`

Some examples of namespaced IAM role resources are:

- `event-received-${data.aws_default_tags.current.tags.environment-name}`
- `${data.aws_default_tags.current.tags.environment-name}-execution-role`
- `${data.aws_default_tags.current.tags.environment-name}-execution-role-${data.aws_region.current.name}`
- `batch-manifests-${data.aws_default_tags.current.tags.application}-${data.aws_default_tags.current.tags.account-name}-${data.aws_region.current.name}/*`

IAM policies support wildcards in the resource name, so we can use wildcards to grant access to all resources that match a pattern. This is useful when granting access to resources that are created dynamically.

Adopting a consistent naming convention for resources will make it easier to grant access to resources and avoid conflicts.

## Decision

Use a consistent naming convention for resources in AWS. The naming convention should include the following values in this order:

- `resource-name` describing the role/function of the resource (e.g. `event-received`)
- `application-name` which is the product name (e.g. `opg-modernising-lpa`)
- `account-name` which is the AWS account name (e.g. `development`)
- `region-name` which is the AWS region name (e.g. `eu-west-1`)
- `environment-name` which is the environment name (e.g. `production`)

It isn't necessary to include resource type in the name because the resource type is already specified in the resource definition.

`application-name` and `account-name` will be used when a resource name must be globally unique, such as an S3 bucket name. They can be omitted if the resource is not globally unique.

(this leads to the consequence that application name used in aws_kms_alias is not necessary)

`account-name` should be used for resources shared at account level.

`region-name` can be omitted if the resource is not region-specific.

`environment-name` should be used for resources that are environment-specific.

(this leads to the consequence that resources will have either `environment-name` or `account-name` and not likely to have both)

## Consequences

- Resources will be easier to identify and grant access to
- some resources will need renaming
10 changes: 9 additions & 1 deletion docs/runbooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,12 @@
Index of available runbooks

* [Example](./README.md)
* [Managing node versions](./managing_node_versions.md)
* [Changes to existing GOV.UK Notify SMS and email templates](changes_to_existing_notify_templates.md)
* [Checking service uptime](checking_service_uptime.md)
* [Configuring weblate access to manage translations and merge conflicts](configuring_weblate_access.md)
* [Deploying into additional regions](deployment_into_additional_regions.md)
* [Disabling DynamoDB Global Tables procedure](disabling_dynamodb_global_tables.md)
* [Manage Maintenance Mode](manage_maintenance_mode.md)
* [Managing node versions for development](./managing_node_versions.md)
* [Rebuilding the Opensearch index](rebuilding_the_opensearch_index.md)
* [Recovering a deleted organisation](recovering_a_deleted_organisation.md)
61 changes: 61 additions & 0 deletions docs/runbooks/rebuilding_the_opensearch_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Rebuilding the Opensearch index

In the event that the Opensearch index needs to be rebuilt, the following steps should be followed.

These instructions are for the `test` environment with an index called `lpas_v2_test`. The same steps can be followed for the other environments, with the appropriate index name `lpas_v2_<environment name>`.

1. Delete the existing index

```shell
DELETE /lpas_v2_test
```

1. Create a new index with the correct mapping

```shell
PUT /lpas_v2_test
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"PK": {"type": "keyword"},
"SK": {"type": "keyword"},
"Donor.FirstNames": {"type": "keyword"},
"Donor.LastName": {"type": "keyword"}
}
}
}
```

1. recreate the opensearch pipeline

We do this by using terraform to taint and recreate the pipeline.

In a shell, navigate to the `terraform/environment` directory and select the correct workspace:

```shell
tf workspace select <environment name>
```
(working with preproduction and production environments requires the breakglass role)
Mark the pipeline for recreation:
```shell
tf taint 'aws_osis_pipeline.lpas_stream[0]'
```
Then apply the changes:
```shell
tf apply
```
1. Reindexing
When the pipeline is created, it will trigger a dynamoDB export to S3. Once the export is finished, the pipeline will import the data into index. After the export processing is complete, the pipeline will switch to processing DynamoDB stream events if enabled.
5 changes: 2 additions & 3 deletions scripts/pull-av-scan-zip-packages.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#!/usr/bin/env bash
#! /usr/bin/env bash
# Bash script to pull S3 Antivirus Scan Zip Packages for Lambda
# Ensures that tfswitch, jq, versions.tf are present before running
#

set -e
Expand All @@ -9,7 +8,7 @@ key="/opg-s3-antivirus/zip-version-main"
value=$(aws-vault exec management-operator -- aws ssm get-parameter --name "$key" --query 'Parameter.Value' --output text 2>/dev/null || true)
echo "Using $key: $value"

echo "Pulling Antivirus lambda and Layer version: $value"
echo "Pulling antivirus lambda zip and layer version: $value"
wget -q -O ./region/modules/s3_antivirus/lambda_layer.zip https://github.com/ministryofjustice/opg-s3-antivirus/releases/download/"$value"/lambda_layer-amd64.zip
wget -q -O ./region/modules/s3_antivirus/lambda_layer.zip.sha256sum https://github.com/ministryofjustice/opg-s3-antivirus/releases/download/"$value"/lambda_layer-amd64.zip.sha256sum
(cd ./region/modules/s3_antivirus/ && sha256sum -c "lambda_layer.zip.sha256sum")
Expand Down
6 changes: 4 additions & 2 deletions terraform/environment/.envrc
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
#!/usr/bin/bash

source ../../scripts/switch-terraform-version.sh
source ../../scripts/pull-av-scan-zip-packages.sh
export TF_CLI_ARGS_init="-backend-config=role_arn=arn:aws:iam::311462405659:role/operator -upgrade -reconfigure"
export TF_VAR_default_role=operator
export TF_VAR_pagerduty_api_key=$(aws-vault exec mlpa-dev -- aws secretsmanager get-secret-value --secret-id "pagerduty_api_key" | jq -r .'SecretString')
export TF_VAR_container_version=$(aws-vault exec management-global -- aws ssm get-parameter --name "/modernising-lpa/container-version/production" --query 'Parameter.Value' --output text)
echo "Deploying Version: $TF_VAR_container_version"
echo "Deploying Modernising LPA version: $TF_VAR_container_version"
source ../../scripts/pull-av-scan-zip-packages.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
resource "aws_iam_role" "opensearch_pipeline" {
name = "${data.aws_default_tags.current.tags.environment-name}-opensearch-pipeline-role"
name = "opensearch-pipeline-role-${data.aws_default_tags.current.tags.environment-name}"
assume_role_policy = data.aws_iam_policy_document.opensearch_pipeline.json
provider = aws.global
}
Expand Down
70 changes: 61 additions & 9 deletions terraform/environment/opensearch_ingestion_pipeline.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
locals {
enable_opensearch_ingestion_pipeline = false
enable_opensearch_ingestion_pipeline = true
}

data "aws_kms_alias" "dynamodb_encryption_key" {
Expand All @@ -12,8 +12,17 @@ data "aws_kms_alias" "opensearch_encryption_key" {
provider = aws.eu_west_1
}

data "aws_kms_alias" "dynamodb_exports_s3_bucket_encryption_key" {
name = "alias/${local.default_tags.application}-dynamodb-exports-s3-bucket-encryption"
provider = aws.eu_west_1
}

data "aws_s3_bucket" "dynamodb_exports_bucket" {
bucket = "dynamodb-exports-${local.default_tags.application}-${local.default_tags.account-name}-eu-west-1"
provider = aws.eu_west_1
}

resource "aws_iam_role_policy" "opensearch_pipeline" {
count = local.enable_opensearch_ingestion_pipeline ? 1 : 0
name = "opensearch_pipeline"
role = module.global.iam_roles.opensearch_pipeline.name
policy = data.aws_iam_policy_document.opensearch_pipeline.json
Expand All @@ -30,7 +39,9 @@ data "aws_iam_policy_document" "opensearch_pipeline" {
"aoss:BatchGetCollection",
"aoss:APIAccessAll"
]
resources = [data.aws_opensearchserverless_collection.lpas_collection.arn]
resources = [
data.aws_opensearchserverless_collection.lpas_collection.arn
]
}

statement {
Expand All @@ -56,14 +67,26 @@ data "aws_iam_policy_document" "opensearch_pipeline" {
"dynamodb:DescribeTable",
"dynamodb:DescribeContinuousBackups",
"dynamodb:ExportTableToPointInTime",
"dynamodb:ListExports",
]
resources = [
aws_dynamodb_table.lpas_table.arn,
]
}

statement {
sid = "DynamoDBEncryptionAccess"
sid = "DescribeExports"
effect = "Allow"
actions = [
"dynamodb:DescribeExport",
]
resources = [
"${aws_dynamodb_table.lpas_table.arn}/export/*",
]
}

statement {
sid = "DynamoDBAndExportEncryptionAccess"
effect = "Allow"
actions = [
"kms:Decrypt",
Expand All @@ -84,10 +107,10 @@ data "aws_iam_policy_document" "opensearch_pipeline" {
]
resources = [
data.aws_kms_alias.opensearch_encryption_key.target_key_arn,
data.aws_kms_alias.dynamodb_exports_s3_bucket_encryption_key.target_key_arn
]
}


statement {
sid = "allowReadFromStream"
effect = "Allow"
Expand All @@ -100,6 +123,24 @@ data "aws_iam_policy_document" "opensearch_pipeline" {
"${aws_dynamodb_table.lpas_table.arn}/stream/*",
]
}

statement {
sid = "allowReadAndWriteToS3ForExport"
effect = "Allow"
actions = [
"s3:HeadBucket",
"s3:GetObject",
"s3:CreateMultipartUpload",
"s3:AbortMultipartUpload",
"s3:UploadPart",
"s3:PutObject",
"s3:PutObjectAcl"
]
resources = [
data.aws_s3_bucket.dynamodb_exports_bucket.arn,
"${data.aws_s3_bucket.dynamodb_exports_bucket.arn}/*",
]
}
}

data "aws_vpc" "main" {
Expand Down Expand Up @@ -160,7 +201,9 @@ locals {
lpas_stream_pipeline_configuration_template_vars = {
source = {
tables = {
table_arn = aws_dynamodb_table.lpas_table.arn
table_arn = aws_dynamodb_table.lpas_table.arn
s3_bucket_name = data.aws_s3_bucket.dynamodb_exports_bucket.id
s3_sse_kms_key_id = data.aws_kms_alias.dynamodb_exports_s3_bucket_encryption_key.target_key_arn
stream = {
start_position = "LATEST"
}
Expand All @@ -177,8 +220,9 @@ locals {

sink = {
opensearch = {
hosts = data.aws_opensearchserverless_collection.lpas_collection.collection_endpoint
index = "lpas"
hosts = data.aws_opensearchserverless_collection.lpas_collection.collection_endpoint
index = "lpas_v2_${local.environment_name}"
document_id = "$${/DocumentID}"
aws = {
sts_role_arn = module.global.iam_roles.opensearch_pipeline.arn
region = "eu-west-1"
Expand All @@ -201,7 +245,9 @@ resource "aws_opensearchserverless_access_policy" "pipeline" {
Rules = [
{
ResourceType = "index",
Resource = ["index/collection-${local.environment_name}/*"],
Resource = [
"index/shared-collection-${local.environment.account_name}/lpas_v2_${local.environment_name}",
],
Permission = [
"aoss:CreateIndex",
"aoss:UpdateIndex",
Expand Down Expand Up @@ -237,5 +283,11 @@ resource "aws_osis_pipeline" "lpas_stream" {
security_group_ids = [aws_security_group.opensearch_ingestion[0].id]
subnet_ids = data.aws_subnet.application[*].id
}
depends_on = [
aws_opensearchserverless_access_policy.pipeline,
aws_iam_role_policy.opensearch_pipeline,
aws_security_group.opensearch_ingestion,
]

provider = aws.eu_west_1
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ dynamodb-pipeline:
acknowledgments: true
tables:
- table_arn: ${source.tables.table_arn}
export:
s3_bucket: ${source.tables.s3_bucket_name}
s3_sse_kms_key_id: ${source.tables.s3_sse_kms_key_id}
stream:
start_position: ${source.tables.stream.start_position}
aws:
Expand All @@ -13,11 +16,46 @@ dynamodb-pipeline:
routes:
- lay_journey_lpas: ${routes.lay_journey_lpas}
- supporter_journey_lpas: ${routes.supporter_journey_lpas}
processor:
- select_entries:
include_keys: [
"PK",
"SK",
"Donor",
]
- copy_values:
entries:
- from_key: "PK"
to_key: "DocumentID"
- substitute_string:
entries:
- source: "DocumentID"
from: "LPA#"
to: "LPA--"
sink:
- opensearch:
hosts: ["${sink.opensearch.hosts}"]
index: ${sink.opensearch.index}
routes: ["lay_journey_lpas", "supporter_journey_lpas"]
template_type: index-template
template_content: >
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"PK": {"type": "keyword"},
"SK": {"type": "keyword"},
"Donor.FirstNames": {"type": "keyword"},
"Donor.LastName": {"type": "keyword"}
}
}
}
document_id: ${sink.opensearch.document_id}
aws:
sts_role_arn: ${sink.opensearch.aws.sts_role_arn}
region: ${sink.opensearch.aws.region}
Expand Down
Loading

0 comments on commit 00c1168

Please sign in to comment.