Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STG Deployment #56

Merged
merged 11 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,6 @@ cfnnag_output
/sample_file/
/src/layers/umccr_utils/python/
.aws-sam

.yarn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good move to localised yarn v4, Will. This git ignore looks a bit wide.

The localised yarn (v4 or latest) script has to check it into the repo. Hence, localised.

There should be couple of folder need exclusion from git ignore: e.g. !.yarn/releases (where localised yarn script installed) and !.yarn/plugins (any extensions use). So on so forth.

See Portal example:

And here is pointer to doc:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh thanks for this! Will update it

node_modules
1 change: 1 addition & 0 deletions .yarnrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nodeLinker: node-modules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do ^^ git ignore changes on .yarn part then this .yarnrc.yml config should change a little bit more to reflect the localised yarn situation.

For example, it will contain pointer to localised yarn path.

yarnPath: .yarn/releases/yarn-4.3.0.cjs

See also:
https://github.com/umccr/data-portal-apis/blob/dev/.yarnrc.yml

12 changes: 10 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
build:
@cdk synth
install:
@corepack enable
@yarn set version stable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we localised (frozen) it well, we may rearrange corepack enable and yarn set version stable to another Makefile target (such as yarn-deps-bump) as we may not want to pull in latest yarn with every other CI build. Instead we inform CI environment (and other developer local dev) to leverage the localised yarn version from the repo.

We run these yarn set version stable in another time when we wish to bump yarn version itself in controlled dependencies upgrade manner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so I think the set yarn version is pinned based on the packageManager field on the package.json file. It seems that this is the new recommended way of installing yarn rather than the local binary checked in the to repo from the v4 installation guide: https://yarnpkg.com/blog/release/4.0#installing-yarn

But do agree that the install target should only be install commands instead of any env setup, so this could move to another makefile target.

@yarn install
@pip install -r requirements.txt
@pip install -r src/requirements.txt


build:
@yarn cdk synth
@sam build -t ./cdk.out/assembly-SSCheckBackEndCdkPipeline-SampleSheetCheckBackEndStage/SSCheckBackEndCdkPipelineSampleSheetCheckBackEndStageSampleSheetCheckBackEnd*.template.json --no-cached

start: build
Expand Down
250 changes: 28 additions & 222 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

Project is the samplesheet check for the UMCCR backend and infrastructure.

The project contains the `app.py` that is the main function of the app and directories:
The project contains the `app.py` which is the main function of the app and directories:

- *src* - the lambdas source code lives
- *stacks* - the stack for which the code is structured at AWS
Expand All @@ -13,47 +13,48 @@ The project contains the `app.py` that is the main function of the app and direc

It is recommended to create a virtual environment for the app.

To do so please follow the instruction below.
To do so please follow the instructions below.

Change your directory to the root of this readme file.

Create a virtual environment for the app.
```
$ virtualenv .venv --python=python3.11

```sh
virtualenv .venv --python=python3.11
```

After the init process completes and the virtualenv is created, you can use the following
step to activate your virtualenv.

```
$ source .venv/bin/activate
```sh
source .venv/bin/activate
```

Install all dependencies
```
$ pip install -r src/requirements.txt
$ pip install -r requirements.txt

```sh
make install
```

# Stack Deployment

**Prerequisite**
- A valid SSL Certificate in `us-east-1` region at ACM for all the domain name needed. See [here](app.py#L35) (`alias_domain_name` on the props variable) on what domain need to be included, determined based on which account is deployed.

- A valid SSL Certificate in `us-east-1` region at ACM for all the domain name needed. See [here](app.py#L33) (`alias_domain_name` on the props variable) on what domain needs to be included, determined based on which account is deployed.
- SSM Parameter for the certificate ARN created above with the name of `/sscheck/api/ssl_certificate_arn`

_Deploying the stack without prerequisite above may result in a stack rollback_
_Deploying the _stack without _the _prerequisite__ above may_ result in a stack rollback_

There are 2 stacks in this application:
- *data_portal_status_page* - Contains the applications stack
- *pipeline* - Contains the pipeline for the stack to run and self update


To deploy the application stack, you will need to deploy the `pipeline` stack. The pipeline stack will take care of the `sscheck_backend_stack` stack deployment.
- *SSCheckBackEndCdkPipeline/SampleSheetCheckBackEndStage/SampleSheetCheckBackEnd* - Contains the applications stack
- *SSCheckBackEndCdkPipeline* - Contains the pipeline for the stack to run and self-update

To deploy the application stack, you will need to deploy the `pipeline` stack. The pipeline stack will take care of the application stack.

Deploy pipeline stack
```
$ cdk deploy SSCheckBackEndCdkPipeline --profile=${AWS_PROFILE}

```sh
cdk deploy SSCheckBackEndCdkPipeline --profile=${AWS_PROFILE}
```

## Starting API locally
Expand All @@ -64,12 +65,14 @@ sam --version
SAM CLI, version 1.100.

cdk --version
2.115.0 (build 58027ee)
2.114.1 (build 02bbb1d)
```

The local start could configure the domain name for the metadata lookup. Currently, it is pointing to `localhost:8000` where the data-portal-api operate locally,
but alternatively you could change and points to remote domain name (e.g. `api.data.dev.umccr.org` or `api.data.prod.umccr.org`).
Just need to pass in the appropriate bearer token when calling this local endpoint.
The local start could configure the domain name for the metadata lookup. Currently, it is pointing to `localhost:8000`
where the data-portal-api operates locally, but alternatively, you could change and point to the remote domain name
(e.g. `api.portal.dev.umccr.org` or `api.portal.prod.umccr.org`). This can be done on the `local-start-env-var.json`
file located at the root of the directory. The appropriate bearer token when calling this local endpoint to make use of
the remote metadata endpoint.

To start simply use the makefile to start a local api running in `localhost:8001`. Run:
```make start```
Expand All @@ -82,7 +85,7 @@ curl --location 'http://127.0.0.1:8001/' \
--form 'logLevel="ERROR"'
```

You could import this to postman and take advantage of the UI to select the appropriate SS file.
You could import this to [Postman](https://www.postman.com/) and take advantage of the UI to select the appropriate SampleSheet file.


## Deploying sscheck_backend_stack from local
Expand All @@ -95,7 +98,7 @@ $ cdk deploy SSCheckBackEndCdkPipeline/SampleSheetCheckBackEndStage/SampleSheetC

## Syncing Google Sheet to Lab Metadata

This is done every 24 hours (overnight), however if one needs to update the lab metadata on demand, the following code may be of assistance.
This is done every 24 hours (overnight), however, if one needs to update the lab metadata on demand, the following code may be of assistance.

Ensure you're logged in to the right AWS account and then run the following code:

Expand All @@ -104,201 +107,4 @@ aws lambda invoke \
--function-name data-portal-api-prod-labmetadata_scheduled_update_processor \
--output json \
output.json
```

## Testing locally

Some unit test is in placed in the respective test folder. Alternatively, this section will give a tutorial
to make your own testing script.

This tutorial goes through running the samplesheet check functions locally.

This allows a user to debug the code on a failing or passing samplesheet.

### Step 1: Create the conda env

Create a conda env / virtual env that you can deploy your requirements to

```bash
conda create --yes \
--name samplesheet-check-backend \
--channel conda-forge \
pip \
python==3.8
```

Install requirements into conda env / virtual env.

```bash
conda activate samplesheet-check-backend

pip install -r src/layers/requirements.txt
```

Set the PYTHONPATH env var to the layers directory so that the `umccr_utils` are
found.

```bash
mkdir -p "${CONDA_PREFIX}/etc/conda/activate.d"
echo '#!/usr/bin/env bash' > "${CONDA_PREFIX}/etc/conda/activate.d/umccr_utils.sh"
echo "export PYTHONPATH=\"${PWD}/lambdas/layers/:\$PYTHONPATH\"" >> "${CONDA_PREFIX}/etc/conda/activate.d/umccr_utils.sh"
```

Re-activate the conda env.

```bash
conda deactivate
conda activate samplesheet-check-backend
```

### Step 2: Creating a testing script

In order to test our samplesheet, we need to run two separate functions in the samplesheet_check.py script:
* `run_sample_sheet_content_check` which ensures that:
* the samplesheet header has the right settings entered
* none of the indexes clash within each lane set in the samplesheet.
* `run_sample_sheet_check_with_metadata` which ensures that:
* if the library id has a topup suffix, ensure the original sample already exists.
* the assay and type in the labmetadata are set as expected. :construction: # Not yet implemented
* the override cycles in the metadata are consistent with the number of non-N bases in the indexes.
* for each sample, the override cycles all suggest the same number of cycles for each read.

An example shell script testing the samplesheet `samplesheet.csv` is shown below:

This script expects the user to have set the following environment variables:
* `PORTAL_TOKEN` (can be obtained from the data.umccr.org home page)
* `data_portal_domain_name` set to `api.data.prod.umccr.org` or `api.data.dev.umccr.org`

```bash
#!/usr/bin/env bash

: '
This script has three sections
1. Setup
- check the portal token env var
- check the samplesheet exists
- check samplesheet_check.py exists
2. Call the run_sample_sheet_content_check function
3. Call the run_sample_sheet_check_with_metadata function
'

# Set to fail
set -euo pipefail

### USER ####

SAMPLESHEET_FILE="SampleSheet.csv"

#############


## GLOBALS ##

SAMPLESHEET_CHECK_SCRIPT="lambdas/functions/samplesheet_check.py"
CONDA_ENV_NAME="samplesheet-check-backend"

#############


### SETUP ###

if [[ -z "${PORTAL_TOKEN-}" ]]; then
echo "Error: Could not get the env var 'PORTAL_TOKEN'. Exiting" 1>&2
exit 1
fi

if [[ -z "${data_portal_domain_name-}" ]]; then
echo "Error: Could not get the env var 'data_portal_domain_name'. Exiting" 1>&2
exit 1
fi

if [[ ! -f "${SAMPLESHEET_FILE}" ]]; then
echo "Error: Could not find the file '${SAMPLESHEET_FILE}'" 1>&2
exit 1
fi

if [[ ! -f "${SAMPLESHEET_CHECK_SCRIPT}" ]]; then
echo "Error: Could not find the file '${SAMPLESHEET_CHECK_SCRIPT}'" 1>&2
exit 1
fi

#############


### TESTS ###

python_file="$(mktemp)"

cat << EOF > "${python_file}"
#!/usr/bin/env python3

# Imports
from samplesheet.samplesheet_check import run_sample_sheet_content_check
from samplesheet.samplesheet_check import run_sample_sheet_check_with_metadata
from utils.samplesheet import SampleSheet

# Get auth header for portal
auth_header = "Bearer ${PORTAL_TOKEN}"

# Get samplesheet
sample_sheet_path = "${SAMPLESHEET_FILE}"
sample_sheet = SampleSheet(sample_sheet_path)

# Check 1
run_sample_sheet_content_check(sample_sheet)

# Check 2
async def set_and_check_metadata(sample_sheet, auth_header):
# Set metadata
loop = asyncio.get_running_loop()
error = await asyncio.gather(
sample_sheet.set_metadata_df_from_api(auth_header, loop),
)

# run metadta check
run_sample_sheet_check_with_metadata(sample_sheet)

loop = asyncio.new_event_loop()
set_and_check_metadata(sample_sheet, auth_header):
loop.close()




EOF

echo "Running samplesheet '${SAMPLESHEET_FILE}' through check script '${SAMPLESHEET_CHECK_SCRIPT}'" 1>&2
conda run \
--name "${CONDA_ENV_NAME}" \
python3 "${python_file}"

echo "Test complete!" 1>&2
rm "${python_file}"

#############
```

## Testing through API

This goes through running the samplesheet check against the API.

This tests the deployed version of the samplesheet check, rather than the local file content.

You can use the curl binary to make a POST request to the API.

This script expects the user to have the following environment variables:
* `PORTAL_TOKEN` (can be obtained from the data.umccr.org home page)

```bash
API_URL="https://api.sscheck.prod.umccr.org" # Dev URL: https://api.sscheck.dev.umccr.org
SAMPLESHEET_FILE="SampleSheet.csv"

curl \
--location \
--request POST \
--header "Authorization: Bearer ${PORTAL_TOKEN}" \
--form "logLevel=ERROR" \
--form "file=@${SAMPLESHEET_FILE}" \
"${API_URL}"
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README test section still applicable? Or perhaps, you have other plan with it or rearranging/refactoring it? 🙄

Especially this last section help me how to leverage sscheck API endpoint, you know. Perhaps about time create /docs etc.

If you have a plan, that's ok with this changes. We can always dig to git history, if any.

Copy link
Member Author

@williamputraintan williamputraintan Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah could maybe put this to docs instead of removal. Well it is because I put in local invoke via sam cli, and so testing locally could be done via the curl API instead of this script. The local invoke API could be configured so that it will run locally but could use the remote metadata endpoint with just make start (and use the curl/postman command to check the local samplesheet file)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Developer guide | User guide would be handy, Will.

```
16 changes: 8 additions & 8 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,28 +10,28 @@
account_id = os.environ.get('CDK_DEFAULT_ACCOUNT')
aws_region = os.environ.get('CDK_DEFAULT_REGION')

# Determine account stage (Identify if it is running on prod or dev)
if account_id == "472057503814": # Account number used for production environment
# Determine account stage (Identify if it is running on prod, stg, or dev)
if account_id == "472057503814": # Prod account
app_stage = "prod"
elif account_id == "455634345446": # Staging account
app_stage = "stg"
else:
app_stage = "dev"

props = {
"pipeline_name": {
"dev": "sscheck-backend",
"prod": "sscheck-backend"
},
"pipeline_artifact_bucket_name": {
"dev": "sscheck-backend-artifact-dev",
"stg": "sscheck-backend-artifact-stg",
"prod": "sscheck-backend-artifact-prod"
},
"repository_source": "umccr/samplesheet-check-backend",
"branch_source": {
"dev": "dev",
"stg": "stg",
"prod": "main"
},
"alias_domain_name": {
"dev": ["api.sscheck.dev.umccr.org"],
"stg": ["api.sscheck.stg.umccr.org"],
"prod": ["api.sscheck.umccr.org", "api.sscheck.prod.umccr.org"]
}
}
Expand All @@ -48,7 +48,7 @@
"SSCheckBackEndCdkPipeline",
stack_name="sscheck-backend-pipeline",
tags={
"sstage": app_stage,
"stage": app_stage,
"stack": "sscheck-backend-pipeline"
}
)
Expand Down
7 changes: 7 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"name": "samplesheet-check-backend",
"packageManager": "[email protected]",
"devDependencies": {
"aws-cdk": "2.145.0"
}
}
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
aws-cdk-lib==2.114.1
aws-cdk-lib==2.145.0
constructs>=10.0.0
Loading