-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add pydra.tasks.core sequence tasks #434
Open
effigies
wants to merge
11
commits into
nipype:master
Choose a base branch
from
effigies:enh/core_tasks
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+223
−8
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
d2cf9b6
ENH: Add Merge task in pydra.tasks.core.sequences
effigies dd42f63
FIX: Only evolve inputs if inputs are provided
effigies a41e83c
FIX: Allow input_spec/output_spec to be class variables
effigies 205d600
PY37: Use typing_extensions to get Literal in Python 3.7
effigies 268987a
ENH: Add Select task
effigies 720ac0e
FIX: Update decorator dunders to allow docstrings and doctests to pas…
effigies d725ece
FIX: Not all Tasks have input_spec class attributes
effigies 0bf450a
ENH: Add Split task
effigies 8442d5f
FIX: Allow input_spec to be a spec
effigies 60b062d
CI: Build docs with Python 3.8.6
effigies 7894d0b
FIX: Only check for self.input_spec if not in state
effigies File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
""" | ||
Pydra provides a small number of utility tasks that are aimed at common manipulations | ||
of Python objects. | ||
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
import attr | ||
import typing as ty | ||
import pydra | ||
from pydra.engine.specs import BaseSpec, SpecInfo, MultiInputObj, MultiOutputObj | ||
from pydra.engine.core import TaskBase | ||
from pydra.engine.helpers import ensure_list | ||
|
||
try: | ||
from typing import Literal | ||
except ImportError: # PY37 | ||
from typing_extensions import Literal | ||
|
||
|
||
@attr.s(kw_only=True) | ||
class SplitInputSpec(BaseSpec): | ||
inlist: ty.List = attr.ib(metadata={"help_string": "List of values to split"}) | ||
splits: ty.List[int] = attr.ib( | ||
metadata={ | ||
"help_string": "Number of outputs in each split - should add to number of inputs" | ||
} | ||
) | ||
squeeze: bool = attr.ib( | ||
default=False, | ||
metadata={"help_string": "Unfold one-element splits removing the list"}, | ||
) | ||
|
||
|
||
class Split(TaskBase): | ||
""" | ||
Task to split lists into multiple outputs | ||
|
||
Examples | ||
-------- | ||
>>> from pydra.tasks.core.sequences import Split | ||
>>> sp = Split(name="sp", splits=[5, 4, 3, 2, 1]) | ||
>>> out = sp(inlist=list(range(15))) | ||
>>> out.output.out1 | ||
[0, 1, 2, 3, 4] | ||
>>> out.output.out2 | ||
[5, 6, 7, 8] | ||
>>> out.output.out5 | ||
[14] | ||
""" | ||
|
||
_task_version = "1" | ||
input_spec = SplitInputSpec | ||
|
||
def __init__(self, splits, *args, **kwargs): | ||
self.output_spec = SpecInfo( | ||
name="Outputs", | ||
fields=[(f"out{i + 1}", list) for i in range(len(splits))], | ||
bases=(BaseSpec,), | ||
) | ||
super().__init__(*args, **kwargs) | ||
self.inputs.splits = splits | ||
|
||
def _run_task(self): | ||
self.output_ = {} | ||
left = 0 | ||
for i, split in enumerate(self.inputs.splits, 1): | ||
right = left + split | ||
self.output_[f"out{i}"] = self.inputs.inlist[left:right] | ||
left = right | ||
|
||
|
||
@attr.s(kw_only=True) | ||
class MergeInputSpec(BaseSpec): | ||
axis: Literal["vstack", "hstack"] = attr.ib( | ||
default="vstack", | ||
metadata={ | ||
"help_string": "Direction in which to merge, hstack requires same number of elements in each input." | ||
}, | ||
) | ||
no_flatten: bool = attr.ib( | ||
default=False, | ||
metadata={ | ||
"help_string": "Append to outlist instead of extending in vstack mode." | ||
}, | ||
) | ||
ravel_inputs: bool = attr.ib( | ||
default=False, | ||
metadata={"help_string": "Ravel inputs when no_flatten is False."}, | ||
) | ||
|
||
|
||
def _ravel(in_val): | ||
if not isinstance(in_val, list): | ||
return in_val | ||
flat_list = [] | ||
for val in in_val: | ||
raveled_val = _ravel(val) | ||
if isinstance(raveled_val, list): | ||
flat_list.extend(raveled_val) | ||
else: | ||
flat_list.append(raveled_val) | ||
return flat_list | ||
|
||
|
||
class Merge(TaskBase): | ||
""" | ||
Task to merge inputs into a single list | ||
|
||
``Merge(1)`` will merge a list of lists | ||
|
||
Examples | ||
-------- | ||
>>> from pydra.tasks.core.sequences import Merge | ||
>>> mi = Merge(3, name="mi") | ||
>>> mi.inputs.in1 = 1 | ||
>>> mi.inputs.in2 = [2, 5] | ||
>>> mi.inputs.in3 = 3 | ||
>>> out = mi() | ||
>>> out.output.out | ||
[1, 2, 5, 3] | ||
|
||
>>> merge = Merge(1, name="merge") | ||
>>> merge.inputs.in1 = [1, [2, 5], 3] | ||
>>> out = merge() | ||
>>> out.output.out | ||
[1, [2, 5], 3] | ||
|
||
>>> merge = Merge(1, name="merge") | ||
>>> merge.inputs.in1 = [1, [2, 5], 3] | ||
>>> merge.inputs.ravel_inputs = True | ||
>>> out = merge() | ||
>>> out.output.out | ||
[1, 2, 5, 3] | ||
|
||
>>> merge = Merge(1, name="merge") | ||
>>> merge.inputs.in1 = [1, [2, 5], 3] | ||
>>> merge.inputs.no_flatten = True | ||
>>> out = merge() | ||
>>> out.output.out | ||
[[1, [2, 5], 3]] | ||
""" | ||
|
||
_task_version = "1" | ||
output_spec = SpecInfo(name="Outputs", fields=[("out", ty.List)], bases=(BaseSpec,)) | ||
|
||
def __init__(self, numinputs, *args, **kwargs): | ||
self._numinputs = max(numinputs, 0) | ||
self.input_spec = SpecInfo( | ||
name="Inputs", | ||
fields=[(f"in{i + 1}", ty.List) for i in range(self._numinputs)], | ||
bases=(MergeInputSpec,), | ||
) | ||
super().__init__(*args, **kwargs) | ||
|
||
def _run_task(self): | ||
self.output_ = {"out": []} | ||
if self._numinputs < 1: | ||
return | ||
|
||
values = [ | ||
getattr(self.inputs, f"in{i + 1}") | ||
for i in range(self._numinputs) | ||
if getattr(self.inputs, f"in{i + 1}") is not attr.NOTHING | ||
] | ||
|
||
if self.inputs.axis == "vstack": | ||
for value in values: | ||
if isinstance(value, list) and not self.inputs.no_flatten: | ||
self.output_["out"].extend( | ||
_ravel(value) if self.inputs.ravel_inputs else value | ||
) | ||
else: | ||
self.output_["out"].append(value) | ||
else: | ||
lists = [ensure_list(val) for val in values] | ||
self.output_["out"] = [ | ||
[val[i] for val in lists] for i in range(len(lists[0])) | ||
] | ||
|
||
|
||
@pydra.mark.task | ||
def Select(inlist: MultiInputObj, index: MultiInputObj) -> MultiOutputObj: | ||
""" | ||
Task to select specific elements from a list | ||
|
||
Examples | ||
-------- | ||
|
||
>>> from pydra.tasks.core.sequences import Select | ||
>>> sl = Select(name="sl") | ||
>>> sl.inputs.inlist = [1, 2, 3, 4, 5] | ||
>>> sl.inputs.index = [3] | ||
>>> out = sl() | ||
>>> out.output.out | ||
4 | ||
|
||
>>> sl = Select(name="sl") | ||
>>> out = sl(inlist=[1, 2, 3, 4, 5], index=[3, 4]) | ||
>>> out.output.out | ||
[4, 5] | ||
|
||
""" | ||
return [inlist[i] for i in index] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now no way to update
output_spec
ifsplits
is computed and passed via a workflow. I'm not sure we could determine the output names in order to connect the outputs of this task if the number of splits is not known.We could instead do something like
max_splits
and create allout{i}
fields with the understanding that anything depending on an output that is not generated will receive aNone
(orattr.NOTHING
...)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not following this i think. a task writer should not concern themselves with splits or combines. that's outside of the task's responsibility. it should simply take whatever input is given and work with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task splits a list. It has nothing to do with splitting/combining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
semantics are hard! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not exactly what you want, but we have also option to pass all outputs, by using
lzout.all_
: https://github.com/nipype/pydra/blob/master/pydra/engine/tests/test_workflow.py#L3662There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in nipype splits creates a named output for the different numbers of splits. an example is the following:
let's say a subject has M T1s and N T2s and i want them to be processed together. i can create a Merge node to aggregate them (merge also produces the relevant output to split them apart), do the processing and then split them back into 2 components (a list of processed T1s and a list of processed T2s. as a workflow designer i know that there are two categories here, what i don't ahead of time is the number. hence being able to set split would be required.
M,N can vary per subject, so cannot be determined up front.
another thought here is that the outputs would come from
lzout
, if we can't find an attribute at construction time we can check that attribute at runtime. it may still error out, but we should be able to evaluate that connection. @effigies - would that address your question, without having to define max_splits?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think if we could have static and dynamic output specs, then that could work. In the static case we would do name checking, and in the dynamic case we leave it to post-run.
In the specific case of
Split()
, you need to know how many splits you have, even if you don't know the specific number in each split. We could either do that dynamically by connections or we could do it statically by declaring a number of splits.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if you know the number of splits in
Split()
you could do it with normal python function. Could someone give me a simple workflow when this could be used. I'm probably missing something important, since I still don't see where I'd like to use it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't specifically care about
Split()
. I've literally never used it. Nonetheless the point is that there are numerous Nipype interfaces where the full set of inputs or outputs isn't known until the interface is instantiated. TheMerge()
example in this PR is one such that I have used and it's easier to write:Than:
Similarly, to do the same thing with
Split()
would involve repeatedly writing:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so it is design to mimic these simple python function, but we just want to allow people to use it instead of creating annotation. I'm fine with this, but I'm a bit afraid of using the name
Split
. PerhapsSplitTask
andMergeTask
? I know it's longer, but havingsplit
andSplit
- one used as a method, the other as a class can confuse people IMO