-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Friday changes: #589
Friday changes: #589
Conversation
## PierianDx * Added pieriandx pipeline manager (monitors runs and sends events once the reports are generated) * Added nails for stacky ## Workflow Type Renames * wgtsQc to wgts-qc * tumor_normal to tumor-normal ## ICAv2 Resilience * Allow a ‘rerun’ of a workflow execution if the analysis id doesn’t exist (but the portal id is in the database). * Reduce concurrency in cttsov2 copy jobs to 5 * Reduce concurrency in bsshfastqcopy jobs to 5 ## Metadata * Refactored metadata tools to match new metadata manager structure in dev * Made metadata mapper lambda construct that allows user to easily collect an orcabus id from an internal id (library id / subject id / project id etc) and vice versa. ## Showers Don’t produce subject based showers. ## Stacky Refactored stacky services to not register subjects / instruments, libraries only, use a scan to find other libraries that have the same subject / on same instrument instead. Reduces number of components within the stack.
Massive changes.! Going through today.. |
Uh... way too much for me to go through in detail, so will focus on a few points of interest... |
Hey I have 3 coffee ☕ today. All double shots! So why not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments, but no blocker from my side...
export const wgtsQcIcav2PipelineWorkflowType = 'wgtsQc'; | ||
export const wgtsQcIcav2PipelineWorkflowType = 'wgts-qc'; | ||
export const wgtsQcIcav2PipelineWorkflowTypeVersion = '4.2.4'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of those are ...WorkflowType
and ...WorkflowTypeVersion
combinations,
but there are some that are ...WorkflowName
and ...WorkflowTypeVersion
or
...WorkflowName
, ...WorkflowVersion
The rename/update does not affect all of those?
E.g. bsshFastqCopy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah let me fix bsshFastqCopy too, bsshFastqCopy is not used in any output paths so got less attention but I think it would be good to fix for workflow run name consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw: are we OK with cttsov2
?
First, we are adding a version where the workflow definition has already a version
field. That feels redundant and confusing and may limit our future uses of that workflow (e.g. what if there's a new version 3.0.1?)
Second, we've seen now multiple other places where the analysis and the sample type are separated, e.g. tso-ctdna. I wonder if we are limiting the possible use cases combining those into one term, e.g. could there be a tso-ctrna, a tso600-ctdna or others?
I just want to take this opportunity to rethink our workflow naming based on the experience we have now, not simply following the convenience of what we used so far.
I am not the best person to judge this though, so I'll follow the consensus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a discussion on Slack where we can reach a consensus.
Get the Step Functions client | ||
:return: SFNClient | ||
""" | ||
return boto3.client('stepfunctions') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps initialise the client statically?
|
||
The AWS Launch step function takes in the following inputs | ||
|
||
* dag: Object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, what's a dag
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great question that not even the PierianDx docs can explain. A dag comprises a name and version (in our case - cromwell_tso500_ctdna_workflow_1.0.4 / tso500_ctdna_workflow), so I think a dag maps to a workflow and as thus we can infer that dag is the acronym 'directed acyclical graph'
@@ -0,0 +1,247 @@ | |||
# PierianDx Pipeline Manager | |||
|
|||
Giant, serverless pipeline attached to a serverless database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not create WRSC events, right?
Would it make sense to add those, so the WorkflowManager can track them?
(I might have missed it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'nails' glue creates workflow run state change 'ready' events.
The monitor-pieriandx-runs step functions will generate a workflow run state change event once pieriandx has completed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Massive effort!
@@ -11,25 +11,51 @@ | |||
}, | |||
"Get Libraries from Instrument Run": { | |||
"Type": "Task", | |||
"Resource": "arn:aws:states:::dynamodb:getItem", | |||
"Resource": "arn:aws:states:::aws-sdk:dynamodb:scan", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note.
This is ok. Perhaps, create GSI/LSI indexes on those none PK/SK lookup attributes where appropriate.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html
If, however, the use case started to explode such that the need of select * from table
use case all the time or, some dynamic with filtering and predicates condition to a table then it is ok to tap into Postgres.
Anyhow. We shall observe this part as-is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is runs one per instrument run, I don't think this is a burden on our compute resources
Statemachine objects have no object 'current' version.
PierianDx
Workflow Type Renames
ICAv2 Resilience
Allow a ‘rerun’ of a workflow execution if the analysis id doesn’t exist (but the portal id is in the database).
Reduce concurrency in cttsov2 copy jobs to 5
Reduce concurrency in bsshfastqcopy jobs to 5
Metadata
Showers
Don’t produce subject based showers.
Stacky
Refactored stacky services to not register subjects / instruments, libraries only, use a scan to find other libraries that have the same subject / on same instrument instead. Reduces number of components within the stack.