Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetaWorkflow Handler and MetaWorkflow Run Handler: Pipeline Automation #11

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
15615cd
first draft of mwfh base magma code
vstevensf Sep 28, 2022
165a612
Further editing of baseline MWF handler, with added functions for cal…
vstevensf Oct 24, 2022
ad1cbec
Merge branch 'master' into vs-mwfr-handler
vstevensf Oct 24, 2022
469804d
Baseline Magma FF MWF Handler -- will be modifying the use of copy
vstevensf Oct 24, 2022
6cccc8e
Creation of helper functions that may eventually be added to dcic utils.
vstevensf Oct 24, 2022
f62fabd
Drafts of pytests for baseline Magma MWF Handler and helper functions.
vstevensf Oct 24, 2022
4d50191
Remove extraneous files I use for local testing
vstevensf Oct 24, 2022
81c826d
Added pytests for magma/utils.py
vstevensf Oct 26, 2022
32a576f
Further edits to pytests of magma utils
vstevensf Nov 1, 2022
fa02b5b
Finished topological sort, need to add docstrings and refactor helper…
vstevensf Nov 3, 2022
3e9348d
Modified some tests, removed a few extraneous. Still need to finish t…
vstevensf Nov 3, 2022
dcb737e
Refactored the utils functions for topological sort into its own file.
vstevensf Nov 4, 2022
6bf3809
Added some pytests for topological sort.
vstevensf Nov 9, 2022
0ddf3d1
Completed first draft of completed topological sort tests.
vstevensf Nov 17, 2022
c2231c2
Small changes to utils -- mainly variable naming.
vstevensf Nov 17, 2022
5752c83
Merge branch 'master' into vs-mwfr-handler
vstevensf Nov 17, 2022
dae1229
Small change to test file for topological sort
vstevensf Nov 23, 2022
8795286
Finished validation of MWF handler and its corresponding pytests.
vstevensf Nov 28, 2022
ffefa6a
Put the ValidatedDictionary class in its own file
vstevensf Jan 11, 2023
5fd8916
Finished ValidatedDictionary class
vstevensf Jan 13, 2023
2d16ab0
Refactored Topological Sort with TopologicalSorter from dcicutils.
vstevensf Jan 20, 2023
c6a0e7f
Draft of MWF Handler, without creation of MWFR Handler
vstevensf Jan 20, 2023
6cb41ac
Further edits to basic handler classes
vstevensf Feb 6, 2023
993361d
Merge branch 'master' into vs-mwfr-handler
vstevensf Feb 16, 2023
6321fbf
Check in
vstevensf Apr 21, 2023
f4a47ba
Main changes to create mwfr handler function
vstevensf Apr 21, 2023
ae249bb
More updates the mwfr handler creation
Apr 24, 2023
58add81
Almost final draft of create MWFR handler functionality
vstevensf May 3, 2023
e411ceb
Got rid of duplication flag, for now
vstevensf May 5, 2023
dcbbec0
Merge branch 'master' into vs-mwfr-handler
vstevensf May 5, 2023
2db5bff
Basic running of mwfr handler.
vstevensf May 5, 2023
c1d2a0b
Draft of status checking and updates of run handler
vstevensf May 5, 2023
3c54a4c
Added docstrings to toposort files
vstevensf May 8, 2023
111a0c5
Added docstrings to MWF handler files and tests, and added to magma c…
vstevensf May 8, 2023
ffdabb6
docstrings for mwfr handler class and tests
vstevensf May 8, 2023
d5deff5
quasi updated handler creation docstrings
vstevensf May 8, 2023
4908f1b
some edits to the create run handler pytests -- need to refactor
vstevensf May 10, 2023
c684121
Finalized rough draft of pytests for create mwfr handler functionality
vstevensf May 16, 2023
503cd00
Merge branch 'master' into vs-mwfr-handler
vstevensf May 16, 2023
93c1715
Edited execute handler function and created draft of pytests, plust d…
vstevensf Jun 1, 2023
82c8d55
renamed test file
vstevensf Jun 1, 2023
1d5e77b
modified FFMetaWfrUtils class and pytests
vstevensf Jun 1, 2023
4f17490
Draft of checkstatus tests
vstevensf Jun 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions magma/magma_constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/usr/bin/env python3

#################################################################
# Vars
#################################################################
Comment on lines +1 to +5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the shebang or the headers here and elsewhere. I know Michele does this, but I find it unnecessary clutter.

TITLE = "title"

# MetaWorkflow Handler attributes
PROJECT = "project"
INSTITUTION = "institution"
UUID = "uuid"
META_WORKFLOWS = "meta_workflows"
ORDERED_META_WORKFLOWS = "ordered_meta_workflows"
META_WORKFLOW = "meta_workflow"
NAME = "name"
DEPENDENCIES = "dependencies"
ITEMS_FOR_CREATION_PROP_TRACE = "items_for_creation_property_trace"
ITEMS_FOR_CREATION_UUID = "items_for_creation_uuid"

# MetaWorkflow Run Handler attributes
COST = "cost"
STATUS = "status"
FINAL_STATUS = "final_status"
ASSOCIATED_META_WORKFLOW_HANDLER = "meta_workflow_handler"
ASSOCIATED_ITEM = "associated_item"
META_WORKFLOW_RUN = "meta_workflow_run"
META_WORKFLOW_RUNS = "meta_workflow_runs"
ITEMS_FOR_CREATION = "items_for_creation"
ERROR = "error"
# statuses
PENDING = "pending"
RUNNING = "running"
COMPLETED = "completed"
FAILED = "failed"
STOPPED = "stopped"

INACTIVE = "inactive"
QC_FAIL = "quality metric failed"


#TODO: the following is here in case dup flag is added in the future
#TODO: add back in
# MWFR_TO_HANDLER_STEP_STATUS_DICT = {
# "pending": "pending",
# "running": "running",
# "completed": "completed",
# "failed": "failed",
# "inactive": "pending",
# "stopped": "stopped",
# "quality metric failed": "failed"
# }
182 changes: 182 additions & 0 deletions magma/metawfl_handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
#!/usr/bin/env python3

################################################
# Libraries
################################################
from copy import deepcopy

from magma.validated_dictionary import ValidatedDictionary
from magma.topological_sort import TopologicalSortHandler
from magma.magma_constants import *
from dcicutils.misc_utils import CycleError

################################################
# Custom Exception classes
################################################
class MetaWorkflowStepCycleError(CycleError):
"""Custom exception for cycle error tracking."""
pass

class MetaWorkflowStepDuplicateError(ValueError):
"""Custom ValueError when MetaWorkflows don't have unique name attributes."""
pass

class MetaWorkflowStepSelfDependencyError(ValueError):
"""Custom ValueError when MetaWorkflow Step has a dependency on itself."""
pass

################################################
# MetaWorkflowStep
################################################
class MetaWorkflowStep(ValidatedDictionary):
"""
Class to represent a MetaWorkflow,
as a step within a MetaWorkflow Handler object
"""

def __init__(self, input_dict):
"""
Constructor method, initialize object and attributes.

:param input_dict: a dictionary of MetaWorkflow step metadata
:type input_dict: dict
"""
super().__init__(input_dict)

# Validate presence of basic attributes of this MetaWorkflow step
self._validate_basic_attributes(META_WORKFLOW, NAME)

self._check_self_dependency()

def _validate_basic_attributes(self, *list_of_attributes):
"""
Validation of the input dictionary for the MetaWorkflow step.
Checks that necessary MetaWorkflow attributes are present for this MetaWorkflow step.

:param list_of_attributes: attributes that are checked
:type list_of_attributes: str(s)
:return: None, if all specified attributes are present
:raises ValueError: if this object doesn't have a specified attribute
:raises AttributeError: if not one (and only one) of items_for_creation attributes is present
"""
super()._validate_basic_attributes(*list_of_attributes)

## Check that one (and only one) of the following attributes is defined on this step:
## ITEMS_FOR_CREATION_UUID or ITEMS_FOR_CREATION_PROP_TRACE
try:
# set None for [default] arg to not throw AttributeError
#TODO: handle this within ff instead? It is CGAP portal-specific
if not getattr(self, ITEMS_FOR_CREATION_UUID, None):
getattr(self, ITEMS_FOR_CREATION_PROP_TRACE)
except AttributeError as e:
raise AttributeError("Object validation error, {0}\n"
.format(e.args[0]))

# for items for creation, this object can only have
# either the UUID or property trace, but not both
if hasattr(self, ITEMS_FOR_CREATION_PROP_TRACE) and hasattr(self, ITEMS_FOR_CREATION_UUID):
raise AttributeError("Object validation error, 'MetaWorkflowStep' object cannot have both of the following attributes: 'items_for_creation_property_trace' and 'items_for_creation_uuid'")
Comment on lines +66 to +78
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider condensing all the getattr and hasattr calls and the try/except block to obtain the values once, and then validate appropriately:

Suggested change
try:
# set None for [default] arg to not throw AttributeError
#TODO: handle this within ff instead? It is CGAP portal-specific
if not getattr(self, ITEMS_FOR_CREATION_UUID, None):
getattr(self, ITEMS_FOR_CREATION_PROP_TRACE)
except AttributeError as e:
raise AttributeError("Object validation error, {0}\n"
.format(e.args[0]))
# for items for creation, this object can only have
# either the UUID or property trace, but not both
if hasattr(self, ITEMS_FOR_CREATION_PROP_TRACE) and hasattr(self, ITEMS_FOR_CREATION_UUID):
raise AttributeError("Object validation error, 'MetaWorkflowStep' object cannot have both of the following attributes: 'items_for_creation_property_trace' and 'items_for_creation_uuid'")
uuids_for_creation = getattr(self, ITEMS_FOR_CREATION_UUID, None)
property_traces_for_creation = getattr(self, ITEMS_FOR_CREATION_PROP_TRACE, None)
if not uuids_for_creation and not property_traces_for_creation:
raise error...
if uuids_for_creation and property_traces_for_creation:
raise error...

As stated elsewhere, strongly consider moving away from setting all these properties as attributes on the classes to escape the getattr and hasattr business.


def _check_self_dependency(self):
"""
Check that this MetaWorkflow Step object doesn't have a self-dependency.

:return: None, if no self-dependencies present
:raises MetaWorkflowStepSelfDependencyError: if there is a self-dependency
"""
if hasattr(self, DEPENDENCIES):
dependencies = getattr(self, DEPENDENCIES)
Comment on lines +87 to +88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider condensing to provide a default of an empty list:

Suggested change
if hasattr(self, DEPENDENCIES):
dependencies = getattr(self, DEPENDENCIES)
dependencies = getattr(self, DEPENDENCIES, [])

for dependency in dependencies:
if dependency == getattr(self, NAME):
raise MetaWorkflowStepSelfDependencyError(f'"{dependency}" has a self dependency.')


################################################
# MetaWorkflowHandler
################################################
class MetaWorkflowHandler(ValidatedDictionary):
"""
Class representing a MetaWorkflow Handler object,
including a list of MetaWorkflows with specified dependencies & other metadata
"""

def __init__(self, input_dict):
"""
Constructor method, initialize object and attributes.

:param input_dict: MetaWorkflow Handler dict, defined by json file from CGAP portal
:type input_dict: dict
"""
### Basic attributes ###
super().__init__(input_dict)

super()._validate_basic_attributes(UUID)

### Calculated attributes ###
# set meta_workflows attribute
# Using meta_workflows array of dicts from CGAP MetaWorkflow Handler
# create dict of the form {meta_workflow_name: MetaWorkflow Step object}
self._set_meta_workflows_dict()
# TODO: NOTE: nowhere in magma is there a check that meta_workflows
# is an empty list. I am putting the burden of that on the user
# would y'all like me to add a check for an empty list? or NoneType?
# right now I only catch instances where meta_workflows doesn't exist,
# and I create an empty dict

# Create ordered MetaWorkflows name list based on dependencies
# This ordered list is what's used to create the array of MetaWorkflow Runs in Run handler
Comment on lines +110 to +127
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider cutting down on the commenting here and updating the method names to reflect their mission better. Having too many comments usually reflects difficult-to-follow code in need of refactoring, and I find this class confusing even though the logic isn't terrible, mostly because of how the attributes are handled.

setattr(self, ORDERED_META_WORKFLOWS, self._create_ordered_meta_workflows_list())

def _set_meta_workflows_dict(self):
"""
Checks for meta_workflows attribute (an array of MetaWorkflows and their metadata) from CGAP portal.

If nonexistent, set handler's meta_workflows attribute as an empty dictionary
If present, copy that list temporarily and redefine meta_workflows attribute
as a dictionary of the form {meta_workflow_name: MetaWorkflow Step object,....}
checking for duplicate steps in the process (i.e. non-unique MetaWorkflow names)

:return: None, if all MetaWorkflowSteps are created successfully
:raises MetaWorkflowStepDuplicateError: if there are duplicate MetaWorkflows, by name
"""
if not hasattr(self, META_WORKFLOWS):
# if not present, set attribute as empty dictionary
setattr(self, META_WORKFLOWS, {})
else:
orig_meta_workflow_list_copy = deepcopy(getattr(self, META_WORKFLOWS))

temp_meta_workflow_step_dict = {}

for meta_workflow in orig_meta_workflow_list_copy:
# create MetaWorkflowStep object for this MetaWorkflow
meta_workflow_step = MetaWorkflowStep(meta_workflow)

# then add to the meta_workflows dictionary
# of the form {meta_workflow["name"]: MetaWorkflowStep(meta_workflow)}
if temp_meta_workflow_step_dict.setdefault(meta_workflow["name"], meta_workflow_step) != meta_workflow_step:
raise MetaWorkflowStepDuplicateError(f'"{meta_workflow["name"]}" is a duplicate MetaWorkflow, \
all MetaWorkflow names must be unique.')
Comment on lines +156 to +158
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider refactoring away from the setdefault method, which is rather uncommon, to checking simply whether the name is already present in the dictionary being built.

Also, probably want to use a constant for name here.


# redefine the "meta_workflows" attribute to this generated dictionary of MetaWorkflowStep objects
setattr(self, META_WORKFLOWS, temp_meta_workflow_step_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider refactoring away from redefining the attribute as information is lost in the process, and there's no way to look at both easily when debugging.


def _create_ordered_meta_workflows_list(self):
"""
Using dictionary of MetaWorkflow name and their corresponding MetaWorkflowStep objects,
generate ordered list of MetaWorkflows, by name.
Uses TopologicalSorter to order these steps based on their defined dependencies.

:return: list of valid topological sorting of MetaWorkflows (by name)
:rtype: list[str]
:raises MetaWorkflowStepCycleError: if there are cyclic dependencies among MetaWorkflow steps
i.e. no valid topological sorting of steps
"""
meta_workflows_dict = getattr(self, META_WORKFLOWS)

try:
# create "graph" that will be passed into the topological sorter
sorter = TopologicalSortHandler(meta_workflows_dict)
# now topologically sort the steps
return sorter.sorted_graph_list()
except CycleError:
raise MetaWorkflowStepCycleError()
Loading