Skip to content

Commit

Permalink
Added feature icav2 copy batch state machine
Browse files Browse the repository at this point in the history
  • Loading branch information
alexiswl committed Mar 9, 2024
1 parent bb18178 commit bf79e31
Show file tree
Hide file tree
Showing 27 changed files with 2,353 additions and 0 deletions.
8 changes: 8 additions & 0 deletions lib/workload/stateless/icav2_copy_batch_utility/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
*.js
!jest.config.js
*.d.ts
node_modules

# CDK asset staging directory
.cdk.staging
cdk.out
6 changes: 6 additions & 0 deletions lib/workload/stateless/icav2_copy_batch_utility/.npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*.ts
!*.d.ts

# CDK asset staging directory
.cdk.staging
cdk.out
145 changes: 145 additions & 0 deletions lib/workload/stateless/icav2_copy_batch_utility/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# ICAv2 Copy Batch Utility

## Overview

The icav2 copy batch utility CDK wraps an AWS Step Function over the ICAv2 CopyBatch API.
This api is designed to copy a list of illumina file ids into a directory.

We exploit this API by taking in a manifest (a list of key values of source ids with their respective destinations) (
a source may have multiple destinations if it needs to be copied into a few different places
), and then monitor a set of API jobs to completion.

These CopyBatch API jobs have about a 20% fail rate, so we sometimes need to resubmit, this is built into the step function.

A 20% fail rate seems quite high, but when they work, these jobs can transfer 80 Gb of data in under 10 seconds so it's worth
persisting with.

When working with the ICAv2 CopyBatch API, we need to provide a unique identifier for the run, and a unique location for the outputs.

Once all jobs are deployed we monitor their status (redeploy if failed - up to five times) and wait for all jobs to complete.

This process will transfer an entire BCLConvert process from one ICAv2 project to another in under 10 minutes!

Note that a current limitation prevents using this API within the same project.


## Inputs

* Statemachine expects the following inputs:
* One of `manifest` or `manifest_b64gz`.


An example is below

```json
{
"manifest": {
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R1_001.fastq.gz": [
"icav2://7595e8f2-32d3-4c76-a324-c6a85dae87b5/ilmn_cttso_fastq_cache/20240308abcd1234/L2301368_run_cache/L2301368/"
],
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R2_001.fastq.gz": [
"icav2://7595e8f2-32d3-4c76-a324-c6a85dae87b5/ilmn_cttso_fastq_cache/20240308abcd1234/L2301368_run_cache/L2301368/"
]
}
}
```

OR

```json
{
"manifest_b64gz": "H4sIAAAAAAAAA+2SzUoDMRhF932KoWvT/P9111ZFod0oqCASMklaBqYzY5IOFvHdHRXBlYtutND9x/3uOdzXUVGMK2d7MoWwJHRdciyAV5wAphUH1jsHFMUOk6CcJwRW9bYBtrH1PoUECcUYCzNDGHFiEJbEzK/ulit+fvsgjSdMcGyY08g7MHf1om36EHPRM0OMBKXE2imvAQ/CDw+DAFaVATiLZcmD9gyVsN3lbpfhst0keGlTfl60264OOUzySx5Pi8cB4QeE5JoHtSaAEk8Bc3LIpIQBJ6zi3gY1JH9CmC5WWxv3kCBCfweBVAgsuP44ZYgiaUvnpdLoq9V4aPB0dlwm721sqmaTJnW7OUk8UOJFjG08KTxQ4U3o2pgTnHnb5RDNKuRYuTRxqf9rnd/Vjszo5wqum3X7vyY5ehu9Ax+WAaFoBgAA"
}
```

## Outputs

```json
{
"job_status_iterable_parameter": {
"job_status_iterable": 1
},
"wait_parameter": {
"wait": true
},
"counters": {
"jobs_failed": 0,
"jobs_running": 0,
"jobs_passed": 1
},
"job_list_with_attempt_counter": [
{
"job_attempt_counter": 1,
"job_id": "6f3d6981-0dff-4413-8388-2bb445d03dd7",
"failed_jobs_list": [],
"dest_uri": "icav2://7595e8f2-32d3-4c76-a324-c6a85dae87b5/ilmn_cttso_fastq_cache/20240308abcd1234/L2301368_run_cache/L2301368/",
"source_uris": [
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R1_001.fastq.gz",
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R2_001.fastq.gz"
],
"job_status": true
}
]
}
```

## Lambdas in this directory

All lambdas run on python 3.11 or higher.

### Flip Manifest

This lambda takes in a manifest and flips the keys and values.
The dictionary becomes a list of objects where the keys are values under `dest_uri` and values are under
the list `source_uris`

In the example above, because both files are heading to the same directory we get the following output

```json
[
{
"dest_uri": "icav2://7595e8f2-32d3-4c76-a324-c6a85dae87b5/ilmn_cttso_fastq_cache/20240308abcd1234/L2301368_run_cache/L2301368/",
"source_uris": [
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R1_001.fastq.gz",
"icav2://b23fb516-d852-4985-adcc-831c12e8cd22/ilmn-analyses/231116_A01052_0172_BHVLM5DSX7_d24651_4c90dc-BclConvert v4_2_7-b719c8d9-5e6d-49e6-a8be-ca17b5e9d40b/output/Samples/Lane_1/L2301368/L2301368_S1_L001_R2_001.fastq.gz"
]
}
]
```

### Launch Copy Job

Simple function that takes in a dest uri and a list of source uris, converts both into a folder id and file ids respectively
and launches the ICAv2 Copy Data Batch Job.

This returns a job id, we tie the job id with the dest uri and source uris (because we cannot collect these from the job themselves),
and monitor the job throughout the step function process.

This lambda is called in a map state.


### Update job session

Goes through the list of jobs (breaks after 10 to prevent timeouts) and

1. Checks the current job id for the dest uri and source uris combination
2. If the job has failed, resubmits it and increments the attempt counter, also adds the previous job id to the failed jobs list
3. If the job has passed, move the job id to the front, so we don't need to monitor it again.


## SSM Parameters

### Parameters generated by CDK

```
```

### External Parameters required by CDK

```
"/icav2/umccr-prod/tso500_ctdna_2.1_pipeline_id"
"/icav2_copy_batch_utility/state_machine_arn"
```


14 changes: 14 additions & 0 deletions lib/workload/stateless/icav2_copy_batch_utility/deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Welcome to your CDK TypeScript project

This is a blank project for CDK development with TypeScript.

The `cdk.json` file tells the CDK Toolkit how to execute your app.

## Useful commands

* `npm run build` compile typescript to js
* `npm run watch` watch for changes and compile
* `npm run test` perform the jest unit tests
* `npx cdk deploy` deploy this stack to your default AWS account/region
* `npx cdk diff` compare deployed stack with current state
* `npx cdk synth` emits the synthesized CloudFormation template
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env node
import 'source-map-support/register';
import * as cdk from 'aws-cdk-lib';
import { ICAv2CopyBatchUtilityStack } from '../lib/stacks/icav2_copy_batch_utility_stack';
import { ICAV2_JWT_SECRET_ARN_SSM_PARAMETER_PATH, ICAV2_COPY_BATCH_STATE_MACHINE_ARN_SSM_PARAMETER_PATH } from '../constants';

const app = new cdk.App();
new ICAv2CopyBatchUtilityStack(app, 'Icav2CopyBatchUtilityStack', {
icav2_jwt_ssm_parameter_path: ICAV2_JWT_SECRET_ARN_SSM_PARAMETER_PATH,
icav2_copy_batch_state_machine_ssm_parameter_path: ICAV2_COPY_BATCH_STATE_MACHINE_ARN_SSM_PARAMETER_PATH,
env: {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: process.env.CDK_DEFAULT_REGION
},
});
66 changes: 66 additions & 0 deletions lib/workload/stateless/icav2_copy_batch_utility/deploy/cdk.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
{
"app": "npx ts-node --prefer-ts-exts bin/icav2_copy_batch_utility.ts",
"watch": {
"include": [
"**"
],
"exclude": [
"README.md",
"cdk*.json",
"**/*.d.ts",
"**/*.js",
"tsconfig.json",
"package*.json",
"yarn.lock",
"node_modules",
"test"
]
},
"context": {
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
"@aws-cdk/core:checkSecretUsage": true,
"@aws-cdk/core:target-partitions": [
"aws",
"aws-cn"
],
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
"@aws-cdk/aws-iam:minimizePolicies": true,
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
"@aws-cdk/core:enablePartitionLiterals": true,
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
"@aws-cdk/aws-iam:standardizedServicePrincipals": true,
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
"@aws-cdk/aws-route53-patters:useCertificate": true,
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
"@aws-cdk/aws-redshift:columnId": true,
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true,
"@aws-cdk/aws-kms:aliasNameRef": true,
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true,
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true,
"@aws-cdk/aws-efs:denyAnonymousAccess": true,
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true,
"@aws-cdk/aws-lambda-nodejs:useLatestRuntimeVersion": true,
"@aws-cdk/aws-efs:mountTargetOrderInsensitiveLogicalId": true,
"@aws-cdk/aws-rds:auroraClusterChangeScopeOfInstanceParameterGroupWithEachParameters": true,
"@aws-cdk/aws-appsync:useArnForSourceApiAssociationIdentifier": true,
"@aws-cdk/aws-rds:preventRenderingDeprecatedCredentials": true,
"@aws-cdk/aws-codepipeline-actions:useNewDefaultBranchForCodeCommitSource": true,
"@aws-cdk/aws-cloudwatch-actions:changeLambdaPermissionLogicalIdForLambdaAction": true,
"@aws-cdk/aws-codepipeline:crossAccountKeysDefaultValueToFalse": true
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
export const ICAV2_JWT_SECRET_ARN_SSM_PARAMETER_PATH = "/icav2/umccr-prod/service-user-trial-jwt-token-secret-arn"

export const ICAV2_COPY_BATCH_STATE_MACHINE_ARN_SSM_PARAMETER_PATH= "/icav2_copy_batch_utility/state_machine_arn"
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
module.exports = {
testEnvironment: 'node',
roots: ['<rootDir>/test'],
testMatch: ['**/*.test.ts'],
transform: {
'^.+\\.tsx?$': 'ts-jest'
}
};
Loading

0 comments on commit bf79e31

Please sign in to comment.