Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Friday changes: #589

Merged
merged 7 commits into from
Oct 8, 2024
Merged

Friday changes: #589

merged 7 commits into from
Oct 8, 2024

Conversation

alexiswl
Copy link
Member

@alexiswl alexiswl commented Oct 4, 2024

PierianDx

  • Added pieriandx pipeline manager (monitors runs and sends events once the reports are generated)
  • Added nails for stacky to connect cttsov2 to pieriandx pipeline

Workflow Type Renames

  • wgtsQc to wgts-qc
  • tumor_normal to tumor-normal

ICAv2 Resilience

  • Allow a ‘rerun’ of a workflow execution if the analysis id doesn’t exist (but the portal id is in the database).

  • Reduce concurrency in cttsov2 copy jobs to 5

  • Reduce concurrency in bsshfastqcopy jobs to 5

Metadata

  • Refactored metadata tools to match new metadata manager structure in dev
  • Made metadata mapper lambda construct that allows user to easily collect an orcabus id from an internal id (library id / subject id / project id etc) and vice versa.

Showers

Don’t produce subject based showers.

Stacky

Refactored stacky services to not register subjects / instruments, libraries only, use a scan to find other libraries that have the same subject / on same instrument instead. Reduces number of components within the stack.

## PierianDx

* Added pieriandx pipeline manager (monitors runs and sends events once the reports are generated)
* Added nails for stacky

## Workflow Type Renames

* wgtsQc to wgts-qc
* tumor_normal to tumor-normal

## ICAv2 Resilience

* Allow a ‘rerun’ of a workflow execution if the analysis id doesn’t exist (but the portal id is in the database).

* Reduce concurrency in cttsov2 copy jobs to 5

* Reduce concurrency in bsshfastqcopy jobs to 5

## Metadata

* Refactored metadata tools to match new metadata manager structure in dev
* Made metadata mapper lambda construct that allows user to easily collect an orcabus id from an internal id (library id / subject id / project id etc) and vice versa.

## Showers

Don’t produce subject based showers.

## Stacky

Refactored stacky services to not register subjects / instruments, libraries only, use a scan to find other libraries that have the same subject / on same instrument instead. Reduces number of components within the stack.
@alexiswl alexiswl added the feature New feature label Oct 4, 2024
@alexiswl alexiswl self-assigned this Oct 4, 2024
@victorskl
Copy link
Member

Massive changes.! Going through today..

@reisingerf
Copy link
Member

Uh... way too much for me to go through in detail, so will focus on a few points of interest...

@victorskl
Copy link
Member

Hey I have 3 coffee ☕ today. All double shots! So why not.

Copy link
Member

@reisingerf reisingerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments, but no blocker from my side...

Comment on lines -294 to 295
export const wgtsQcIcav2PipelineWorkflowType = 'wgtsQc';
export const wgtsQcIcav2PipelineWorkflowType = 'wgts-qc';
export const wgtsQcIcav2PipelineWorkflowTypeVersion = '4.2.4';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of those are ...WorkflowType and ...WorkflowTypeVersion combinations,
but there are some that are ...WorkflowName and ...WorkflowTypeVersion or
...WorkflowName, ...WorkflowVersion

The rename/update does not affect all of those?
E.g. bsshFastqCopy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah let me fix bsshFastqCopy too, bsshFastqCopy is not used in any output paths so got less attention but I think it would be good to fix for workflow run name consistency

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw: are we OK with cttsov2?

First, we are adding a version where the workflow definition has already a version field. That feels redundant and confusing and may limit our future uses of that workflow (e.g. what if there's a new version 3.0.1?)
Second, we've seen now multiple other places where the analysis and the sample type are separated, e.g. tso-ctdna. I wonder if we are limiting the possible use cases combining those into one term, e.g. could there be a tso-ctrna, a tso600-ctdna or others?

I just want to take this opportunity to rethink our workflow naming based on the experience we have now, not simply following the convenience of what we used so far.
I am not the best person to judge this though, so I'll follow the consensus

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a discussion on Slack where we can reach a consensus.

Get the Step Functions client
:return: SFNClient
"""
return boto3.client('stepfunctions')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps initialise the client statically?


The AWS Launch step function takes in the following inputs

* dag: Object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, what's a dag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question that not even the PierianDx docs can explain. A dag comprises a name and version (in our case - cromwell_tso500_ctdna_workflow_1.0.4 / tso500_ctdna_workflow), so I think a dag maps to a workflow and as thus we can infer that dag is the acronym 'directed acyclical graph'

@@ -0,0 +1,247 @@
# PierianDx Pipeline Manager

Giant, serverless pipeline attached to a serverless database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not create WRSC events, right?
Would it make sense to add those, so the WorkflowManager can track them?

(I might have missed it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'nails' glue creates workflow run state change 'ready' events.

The monitor-pieriandx-runs step functions will generate a workflow run state change event once pieriandx has completed

Copy link
Member

@victorskl victorskl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Massive effort!

@@ -11,25 +11,51 @@
},
"Get Libraries from Instrument Run": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Resource": "arn:aws:states:::aws-sdk:dynamodb:scan",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note.

This is ok. Perhaps, create GSI/LSI indexes on those none PK/SK lookup attributes where appropriate.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html

If, however, the use case started to explode such that the need of select * from table use case all the time or, some dynamic with filtering and predicates condition to a table then it is ok to tap into Postgres.

Anyhow. We shall observe this part as-is for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is runs one per instrument run, I don't think this is a burden on our compute resources

Statemachine objects have no object 'current' version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
4 participants