-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Terraform configuration for AWS Batch #34
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Deployment | ||
|
||
- [AWS Credentials](#aws-credentials) | ||
- [Publish Container Images](#publish-container-images) | ||
- [Terraform](#terraform) | ||
- [Database Migrations](#database-migrations) | ||
|
||
## AWS Credentials | ||
|
||
Follow [these](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html#sso-configure-profile) instructions to configure a named AWS profile: | ||
|
||
- Use https://d-906762f877.awsapps.com/start as the SSO start URL. | ||
- Use `us-east-1` as the SSO region. | ||
|
||
Use the `aws sso login` command to refresh your login if your credentials expire: | ||
|
||
```console | ||
$ aws sso login --profile my-profile | ||
``` | ||
|
||
## Publish Container Images | ||
|
||
Build a container image for the Python application (`cibuild`) and publish it to Amazon ECR (`cipublish`): | ||
|
||
```console | ||
$ ./scripts/cibuild | ||
... | ||
=> => naming to docker.io/library/image-deid-etl:da845bf | ||
$ ./scripts/cipublish | ||
``` | ||
|
||
## Terraform | ||
|
||
Launch an instance of the included Terraform container image: | ||
|
||
```console | ||
$ docker-compose -f docker-compose.ci.yml run --rm terraform | ||
bash-5.1# | ||
``` | ||
|
||
Once inside the context of the container image, set `GIT_COMMIT` to the tag of a published container image (e.g., `da845bf`): | ||
|
||
```console | ||
bash-5.1# export GIT_COMMIT=da845bf | ||
``` | ||
|
||
Finally, use `infra` to generate and apply a Terraform plan: | ||
|
||
```console | ||
bash-5.1# ./scripts/infra plan | ||
bash-5.1# ./scripts/infra apply | ||
``` | ||
|
||
## Database Migrations | ||
|
||
Execute database migrations by submitting a Batch job that invokes the application's `initdb` command: | ||
|
||
- Select the most recent job definition for [jobImageDeidEtl](https://console.aws.amazon.com/batch/home?region=us-east-1#job-definition). | ||
- Select **Submit new job**. | ||
- Select the following: | ||
- **Name**: Any one-off description of the work you're performing, e.g.: `initialize-the-database`. | ||
- **Job queue**: `queueImageDeidEtl`. | ||
- **Command**: `image-deid-etl initdb`. | ||
- Click **Submit**. | ||
- Monitor the log output of the submitted job by viewing the job detail and clicking the link under **Log group name**. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
# | ||
# Security Group resources | ||
# | ||
resource "aws_security_group" "batch" { | ||
name_prefix = "sgBatchContainerInstance-" | ||
vpc_id = var.vpc_id | ||
|
||
tags = { | ||
Name = "sgBatchContainerInstance" | ||
} | ||
|
||
lifecycle { | ||
create_before_destroy = true | ||
} | ||
} | ||
|
||
# | ||
# Batch resources | ||
# | ||
resource "aws_launch_template" "default" { | ||
name_prefix = "ltBatchContainerInstance-" | ||
|
||
block_device_mappings { | ||
device_name = "/dev/xvda" | ||
|
||
ebs { | ||
volume_size = var.batch_root_block_device_size | ||
volume_type = var.batch_root_block_device_type | ||
} | ||
} | ||
|
||
user_data = base64encode(file("cloud-config/batch-container-instance")) | ||
} | ||
|
||
resource "aws_batch_compute_environment" "default" { | ||
compute_environment_name_prefix = "batch${local.short}-" | ||
type = "MANAGED" | ||
state = "ENABLED" | ||
service_role = aws_iam_role.batch_service_role.arn | ||
|
||
compute_resources { | ||
type = "SPOT" | ||
allocation_strategy = var.batch_spot_fleet_allocation_strategy | ||
bid_percentage = var.batch_spot_fleet_bid_percentage | ||
Comment on lines
+42
to
+44
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've been using the capacity-optimized allocation strategy since it was introduced in 2019. At a high level, the capacity-optimized strategy will try to get the smallest instance type that will fit your workload, but if none are available at your spot bid percentage, it will grab a larger instance type so that jobs don't sit in the queue. Snce we supply this via an input variable, we can make adjustments by modifying the |
||
|
||
ec2_configuration { | ||
image_type = "ECS_AL2" | ||
} | ||
Comment on lines
+46
to
+48
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This tells Batch to use the latest version of the ECS-optimized Amazon Linux 2 AMI. |
||
|
||
ec2_key_pair = aws_key_pair.bastion.key_name | ||
|
||
min_vcpus = var.batch_min_vcpus | ||
max_vcpus = var.batch_max_vcpus | ||
|
||
launch_template { | ||
launch_template_id = aws_launch_template.default.id | ||
version = aws_launch_template.default.latest_version | ||
} | ||
|
||
spot_iam_fleet_role = aws_iam_role.spot_fleet_service_role.arn | ||
instance_role = aws_iam_instance_profile.ecs_instance_role.arn | ||
|
||
instance_type = var.batch_instance_types | ||
|
||
security_group_ids = [aws_security_group.batch.id] | ||
subnets = var.vpc_private_subnet_ids | ||
|
||
tags = { | ||
Name = "BatchWorker" | ||
Project = var.project | ||
Environment = var.environment | ||
} | ||
} | ||
|
||
depends_on = [aws_iam_role_policy_attachment.batch_service_role_policy] | ||
|
||
lifecycle { | ||
create_before_destroy = true | ||
} | ||
} | ||
|
||
resource "aws_batch_job_queue" "default" { | ||
name = "queue${local.short}" | ||
priority = 1 | ||
state = "ENABLED" | ||
compute_environments = [aws_batch_compute_environment.default.arn] | ||
} | ||
|
||
resource "aws_batch_job_definition" "default" { | ||
name = "job${local.short}" | ||
type = "container" | ||
|
||
container_properties = templatefile("${path.module}/job-definitions/image-deid-etl.json.tmpl", { | ||
image_url = "${module.ecr.repository_url}:${var.image_tag}" | ||
|
||
image_deid_etl_vcpus = var.image_deid_etl_vcpus | ||
image_deid_etl_memory = var.image_deid_etl_memory | ||
|
||
database_url = "postgresql://${var.rds_database_username}:${var.rds_database_password}@${module.database.hostname}:${module.database.port}/${var.rds_database_name}" | ||
|
||
flywheel_api_key = var.flywheel_api_key | ||
flywheel_group = var.flywheel_group | ||
orthanc_credentials = var.orthanc_credentials | ||
orthanc_host = var.orthanc_host | ||
orthanc_port = var.orthanc_port | ||
|
||
phi_data_bucket_name = var.d3b_phi_data_bucket_name | ||
subject_id_mapping_path = var.subject_id_mapping_path | ||
|
||
image_deid_etl_log_level = var.image_deid_etl_log_level | ||
}) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
Content-Type: multipart/mixed; boundary="==BOUNDARY==" | ||
MIME-Version: 1.0 | ||
|
||
--==BOUNDARY== | ||
Content-Type: text/cloud-boothook; charset="us-ascii" | ||
|
||
# Manually mount unformatted instance store volumes. Mounting in a cloud-boothook | ||
# makes it more likely the drive is mounted before the Docker daemon and ECS agent | ||
# start, which helps mitigate potential race conditions. | ||
# | ||
# See: | ||
# - https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bootstrap_container_instance.html#bootstrap_docker_daemon | ||
# - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/amazon-linux-ami-basics.html#supported-user-data-formats | ||
mkfs.ext4 -E nodiscard /dev/nvme1n1 | ||
mkdir -p /media/ephemeral0 | ||
mount -t ext4 -o defaults,nofail,discard /dev/nvme1n1 /media/ephemeral0 | ||
|
||
--==BOUNDARY== | ||
Comment on lines
+1
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're using the The instance store volumes are unformatted, so you need to initialize them whenever a container instance comes online. Unfortunately, we can't take advantage of ephemeral storage just yet. As it's currently designed, the ETL will write files to the current working directory (which is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Is there an issue open for this mechanism yet? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for prompting me. I should've opened an issue when I encountered this. I opened #35. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,7 +7,7 @@ resource "aws_db_subnet_group" "default" { | |
subnet_ids = var.vpc_private_subnet_ids | ||
|
||
tags = { | ||
Name = "dbsngDatabaseServer" | ||
Name = "dbsngDatabaseServer" | ||
} | ||
} | ||
|
||
|
@@ -57,9 +57,7 @@ resource "aws_db_parameter_group" "default" { | |
} | ||
|
||
tags = { | ||
Name = "dbpgDatabaseServer" | ||
Project = var.project | ||
Environment = var.environment | ||
Name = "dbpgDatabaseServer" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Formatting tweaks that can be ignored. |
||
} | ||
|
||
lifecycle { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
compute_environment_name_prefix
combined with the lifecycle meta-argument allows you to change the CE without Terraform getting stuck.Terraform will detach the deposed CE from the job queue, and queued jobs will hang in the
RUNNABLE
status. Then, once the new CE is attached to the job queue, the juices will start flowing.