-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STG Deployment #56
STG Deployment #56
Changes from 7 commits
537a4eb
e954945
4fdf11d
0d09e41
3d7dc09
abc6fc7
c38ce93
f2cdf42
af6ad26
69dd5af
c99522b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,3 +17,6 @@ cfnnag_output | |
/sample_file/ | ||
/src/layers/umccr_utils/python/ | ||
.aws-sam | ||
|
||
.yarn | ||
node_modules |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
nodeLinker: node-modules | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we do ^^ git ignore changes on For example, it will contain pointer to localised yarn path.
See also: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,13 @@ | ||
build: | ||
@cdk synth | ||
install: | ||
@corepack enable | ||
@yarn set version stable | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we localised (frozen) it well, we may rearrange We run these There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, so I think the set yarn version is pinned based on the But do agree that the install target should only be install commands instead of any env setup, so this could move to another makefile target. |
||
@yarn install | ||
@pip install -r requirements.txt | ||
@pip install -r src/requirements.txt | ||
|
||
|
||
build: | ||
@yarn cdk synth | ||
@sam build -t ./cdk.out/assembly-SSCheckBackEndCdkPipeline-SampleSheetCheckBackEndStage/SSCheckBackEndCdkPipelineSampleSheetCheckBackEndStageSampleSheetCheckBackEnd*.template.json --no-cached | ||
|
||
start: build | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
|
||
Project is the samplesheet check for the UMCCR backend and infrastructure. | ||
|
||
The project contains the `app.py` that is the main function of the app and directories: | ||
The project contains the `app.py` which is the main function of the app and directories: | ||
|
||
- *src* - the lambdas source code lives | ||
- *stacks* - the stack for which the code is structured at AWS | ||
|
@@ -13,47 +13,48 @@ The project contains the `app.py` that is the main function of the app and direc | |
|
||
It is recommended to create a virtual environment for the app. | ||
|
||
To do so please follow the instruction below. | ||
To do so please follow the instructions below. | ||
|
||
Change your directory to the root of this readme file. | ||
|
||
Create a virtual environment for the app. | ||
``` | ||
$ virtualenv .venv --python=python3.11 | ||
|
||
```sh | ||
virtualenv .venv --python=python3.11 | ||
``` | ||
|
||
After the init process completes and the virtualenv is created, you can use the following | ||
step to activate your virtualenv. | ||
|
||
``` | ||
$ source .venv/bin/activate | ||
```sh | ||
source .venv/bin/activate | ||
``` | ||
|
||
Install all dependencies | ||
``` | ||
$ pip install -r src/requirements.txt | ||
$ pip install -r requirements.txt | ||
|
||
```sh | ||
make install | ||
``` | ||
|
||
# Stack Deployment | ||
|
||
**Prerequisite** | ||
- A valid SSL Certificate in `us-east-1` region at ACM for all the domain name needed. See [here](app.py#L35) (`alias_domain_name` on the props variable) on what domain need to be included, determined based on which account is deployed. | ||
|
||
- A valid SSL Certificate in `us-east-1` region at ACM for all the domain name needed. See [here](app.py#L33) (`alias_domain_name` on the props variable) on what domain needs to be included, determined based on which account is deployed. | ||
- SSM Parameter for the certificate ARN created above with the name of `/sscheck/api/ssl_certificate_arn` | ||
|
||
_Deploying the stack without prerequisite above may result in a stack rollback_ | ||
_Deploying the _stack without _the _prerequisite__ above may_ result in a stack rollback_ | ||
|
||
There are 2 stacks in this application: | ||
- *data_portal_status_page* - Contains the applications stack | ||
- *pipeline* - Contains the pipeline for the stack to run and self update | ||
|
||
|
||
To deploy the application stack, you will need to deploy the `pipeline` stack. The pipeline stack will take care of the `sscheck_backend_stack` stack deployment. | ||
- *SSCheckBackEndCdkPipeline/SampleSheetCheckBackEndStage/SampleSheetCheckBackEnd* - Contains the applications stack | ||
- *SSCheckBackEndCdkPipeline* - Contains the pipeline for the stack to run and self-update | ||
|
||
To deploy the application stack, you will need to deploy the `pipeline` stack. The pipeline stack will take care of the application stack. | ||
|
||
Deploy pipeline stack | ||
``` | ||
$ cdk deploy SSCheckBackEndCdkPipeline --profile=${AWS_PROFILE} | ||
|
||
```sh | ||
cdk deploy SSCheckBackEndCdkPipeline --profile=${AWS_PROFILE} | ||
``` | ||
|
||
## Starting API locally | ||
|
@@ -64,12 +65,14 @@ sam --version | |
SAM CLI, version 1.100. | ||
|
||
cdk --version | ||
2.115.0 (build 58027ee) | ||
2.114.1 (build 02bbb1d) | ||
``` | ||
|
||
The local start could configure the domain name for the metadata lookup. Currently, it is pointing to `localhost:8000` where the data-portal-api operate locally, | ||
but alternatively you could change and points to remote domain name (e.g. `api.data.dev.umccr.org` or `api.data.prod.umccr.org`). | ||
Just need to pass in the appropriate bearer token when calling this local endpoint. | ||
The local start could configure the domain name for the metadata lookup. Currently, it is pointing to `localhost:8000` | ||
where the data-portal-api operates locally, but alternatively, you could change and point to the remote domain name | ||
(e.g. `api.portal.dev.umccr.org` or `api.portal.prod.umccr.org`). This can be done on the `local-start-env-var.json` | ||
file located at the root of the directory. The appropriate bearer token when calling this local endpoint to make use of | ||
the remote metadata endpoint. | ||
|
||
To start simply use the makefile to start a local api running in `localhost:8001`. Run: | ||
```make start``` | ||
|
@@ -82,7 +85,7 @@ curl --location 'http://127.0.0.1:8001/' \ | |
--form 'logLevel="ERROR"' | ||
``` | ||
|
||
You could import this to postman and take advantage of the UI to select the appropriate SS file. | ||
You could import this to [Postman](https://www.postman.com/) and take advantage of the UI to select the appropriate SampleSheet file. | ||
|
||
|
||
## Deploying sscheck_backend_stack from local | ||
|
@@ -95,7 +98,7 @@ $ cdk deploy SSCheckBackEndCdkPipeline/SampleSheetCheckBackEndStage/SampleSheetC | |
|
||
## Syncing Google Sheet to Lab Metadata | ||
|
||
This is done every 24 hours (overnight), however if one needs to update the lab metadata on demand, the following code may be of assistance. | ||
This is done every 24 hours (overnight), however, if one needs to update the lab metadata on demand, the following code may be of assistance. | ||
|
||
Ensure you're logged in to the right AWS account and then run the following code: | ||
|
||
|
@@ -104,201 +107,4 @@ aws lambda invoke \ | |
--function-name data-portal-api-prod-labmetadata_scheduled_update_processor \ | ||
--output json \ | ||
output.json | ||
``` | ||
|
||
## Testing locally | ||
|
||
Some unit test is in placed in the respective test folder. Alternatively, this section will give a tutorial | ||
to make your own testing script. | ||
|
||
This tutorial goes through running the samplesheet check functions locally. | ||
|
||
This allows a user to debug the code on a failing or passing samplesheet. | ||
|
||
### Step 1: Create the conda env | ||
|
||
Create a conda env / virtual env that you can deploy your requirements to | ||
|
||
```bash | ||
conda create --yes \ | ||
--name samplesheet-check-backend \ | ||
--channel conda-forge \ | ||
pip \ | ||
python==3.8 | ||
``` | ||
|
||
Install requirements into conda env / virtual env. | ||
|
||
```bash | ||
conda activate samplesheet-check-backend | ||
|
||
pip install -r src/layers/requirements.txt | ||
``` | ||
|
||
Set the PYTHONPATH env var to the layers directory so that the `umccr_utils` are | ||
found. | ||
|
||
```bash | ||
mkdir -p "${CONDA_PREFIX}/etc/conda/activate.d" | ||
echo '#!/usr/bin/env bash' > "${CONDA_PREFIX}/etc/conda/activate.d/umccr_utils.sh" | ||
echo "export PYTHONPATH=\"${PWD}/lambdas/layers/:\$PYTHONPATH\"" >> "${CONDA_PREFIX}/etc/conda/activate.d/umccr_utils.sh" | ||
``` | ||
|
||
Re-activate the conda env. | ||
|
||
```bash | ||
conda deactivate | ||
conda activate samplesheet-check-backend | ||
``` | ||
|
||
### Step 2: Creating a testing script | ||
|
||
In order to test our samplesheet, we need to run two separate functions in the samplesheet_check.py script: | ||
* `run_sample_sheet_content_check` which ensures that: | ||
* the samplesheet header has the right settings entered | ||
* none of the indexes clash within each lane set in the samplesheet. | ||
* `run_sample_sheet_check_with_metadata` which ensures that: | ||
* if the library id has a topup suffix, ensure the original sample already exists. | ||
* the assay and type in the labmetadata are set as expected. :construction: # Not yet implemented | ||
* the override cycles in the metadata are consistent with the number of non-N bases in the indexes. | ||
* for each sample, the override cycles all suggest the same number of cycles for each read. | ||
|
||
An example shell script testing the samplesheet `samplesheet.csv` is shown below: | ||
|
||
This script expects the user to have set the following environment variables: | ||
* `PORTAL_TOKEN` (can be obtained from the data.umccr.org home page) | ||
* `data_portal_domain_name` set to `api.data.prod.umccr.org` or `api.data.dev.umccr.org` | ||
|
||
```bash | ||
#!/usr/bin/env bash | ||
|
||
: ' | ||
This script has three sections | ||
1. Setup | ||
- check the portal token env var | ||
- check the samplesheet exists | ||
- check samplesheet_check.py exists | ||
2. Call the run_sample_sheet_content_check function | ||
3. Call the run_sample_sheet_check_with_metadata function | ||
' | ||
|
||
# Set to fail | ||
set -euo pipefail | ||
|
||
### USER #### | ||
|
||
SAMPLESHEET_FILE="SampleSheet.csv" | ||
|
||
############# | ||
|
||
|
||
## GLOBALS ## | ||
|
||
SAMPLESHEET_CHECK_SCRIPT="lambdas/functions/samplesheet_check.py" | ||
CONDA_ENV_NAME="samplesheet-check-backend" | ||
|
||
############# | ||
|
||
|
||
### SETUP ### | ||
|
||
if [[ -z "${PORTAL_TOKEN-}" ]]; then | ||
echo "Error: Could not get the env var 'PORTAL_TOKEN'. Exiting" 1>&2 | ||
exit 1 | ||
fi | ||
|
||
if [[ -z "${data_portal_domain_name-}" ]]; then | ||
echo "Error: Could not get the env var 'data_portal_domain_name'. Exiting" 1>&2 | ||
exit 1 | ||
fi | ||
|
||
if [[ ! -f "${SAMPLESHEET_FILE}" ]]; then | ||
echo "Error: Could not find the file '${SAMPLESHEET_FILE}'" 1>&2 | ||
exit 1 | ||
fi | ||
|
||
if [[ ! -f "${SAMPLESHEET_CHECK_SCRIPT}" ]]; then | ||
echo "Error: Could not find the file '${SAMPLESHEET_CHECK_SCRIPT}'" 1>&2 | ||
exit 1 | ||
fi | ||
|
||
############# | ||
|
||
|
||
### TESTS ### | ||
|
||
python_file="$(mktemp)" | ||
|
||
cat << EOF > "${python_file}" | ||
#!/usr/bin/env python3 | ||
|
||
# Imports | ||
from samplesheet.samplesheet_check import run_sample_sheet_content_check | ||
from samplesheet.samplesheet_check import run_sample_sheet_check_with_metadata | ||
from utils.samplesheet import SampleSheet | ||
|
||
# Get auth header for portal | ||
auth_header = "Bearer ${PORTAL_TOKEN}" | ||
|
||
# Get samplesheet | ||
sample_sheet_path = "${SAMPLESHEET_FILE}" | ||
sample_sheet = SampleSheet(sample_sheet_path) | ||
|
||
# Check 1 | ||
run_sample_sheet_content_check(sample_sheet) | ||
|
||
# Check 2 | ||
async def set_and_check_metadata(sample_sheet, auth_header): | ||
# Set metadata | ||
loop = asyncio.get_running_loop() | ||
error = await asyncio.gather( | ||
sample_sheet.set_metadata_df_from_api(auth_header, loop), | ||
) | ||
|
||
# run metadta check | ||
run_sample_sheet_check_with_metadata(sample_sheet) | ||
|
||
loop = asyncio.new_event_loop() | ||
set_and_check_metadata(sample_sheet, auth_header): | ||
loop.close() | ||
|
||
|
||
|
||
|
||
EOF | ||
|
||
echo "Running samplesheet '${SAMPLESHEET_FILE}' through check script '${SAMPLESHEET_CHECK_SCRIPT}'" 1>&2 | ||
conda run \ | ||
--name "${CONDA_ENV_NAME}" \ | ||
python3 "${python_file}" | ||
|
||
echo "Test complete!" 1>&2 | ||
rm "${python_file}" | ||
|
||
############# | ||
``` | ||
|
||
## Testing through API | ||
|
||
This goes through running the samplesheet check against the API. | ||
|
||
This tests the deployed version of the samplesheet check, rather than the local file content. | ||
|
||
You can use the curl binary to make a POST request to the API. | ||
|
||
This script expects the user to have the following environment variables: | ||
* `PORTAL_TOKEN` (can be obtained from the data.umccr.org home page) | ||
|
||
```bash | ||
API_URL="https://api.sscheck.prod.umccr.org" # Dev URL: https://api.sscheck.dev.umccr.org | ||
SAMPLESHEET_FILE="SampleSheet.csv" | ||
|
||
curl \ | ||
--location \ | ||
--request POST \ | ||
--header "Authorization: Bearer ${PORTAL_TOKEN}" \ | ||
--form "logLevel=ERROR" \ | ||
--form "file=@${SAMPLESHEET_FILE}" \ | ||
"${API_URL}" | ||
``` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This README test section still applicable? Or perhaps, you have other plan with it or rearranging/refactoring it? 🙄 Especially this last section help me how to leverage sscheck API endpoint, you know. Perhaps about time create If you have a plan, that's ok with this changes. We can always dig to git history, if any. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah could maybe put this to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup. Developer guide | User guide would be handy, Will. |
||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"name": "samplesheet-check-backend", | ||
"packageManager": "[email protected]", | ||
"devDependencies": { | ||
"aws-cdk": "2.145.0" | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
aws-cdk-lib==2.114.1 | ||
aws-cdk-lib==2.145.0 | ||
constructs>=10.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good move to localised yarn v4, Will. This git ignore looks a bit wide.
The localised yarn (v4 or latest) script has to check it into the repo. Hence, localised.
There should be couple of folder need exclusion from git ignore: e.g.
!.yarn/releases
(where localised yarn script installed) and!.yarn/plugins
(any extensions use). So on so forth.See Portal example:
And here is pointer to doc:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh thanks for this! Will update it