-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[architecture] Specify pipeline in task_decorator #111
Comments
Sorry I'm new to looking at the code in this repository. To me calling the class and naming it main_pipeline at the beginning makes sense. Are you able to elaborate on your issue so I can understand it further? Do you suggest that the Pipeline class isn't declared as a global variable? |
@Acribbs This concerns class vs. instances. I am perfectly happy to define an instance as a global variable, but I don't see attaching a global dictionary to class definition as appropriate. If you want a global dictionray, why don't make it |
Ah I see, interpreted your issue wrongly, thanks for the further explanation. What do you think @AndreasHeger ? |
I think I am confusing Pipeline.pipelines with the original issue. Will update once I get to laptop.
…Sent from my iPhone
On 23 Mar 2019, at 21:55, Adam Cribbs <[email protected]<mailto:[email protected]>> wrote:
Ah I see, interpreted your issue wrongly, thanks for the further explanation. What do you think @AndreasHeger<https://github.com/AndreasHeger> ?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#111 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AHqXbGQez7nwkJa2mU4e2G0L8UL3GnZxks5vZqLVgaJpZM4b_ViU>.
|
ok no worries, I'm looking at the code and a little confused. But iv only really started to understand ruffus recently so wasn't surprised, but if you say there may be a confusion then I will wait for your comments. thanks |
Expanded snippet: class task_decorator(object):
"""
Forwards to functions within Task
"""
def __init__(self, *decoratorArgs, **decoratorNamedArgs):
"""
saves decorator arguments
"""
self.args = decoratorArgs
self.named_args = decoratorNamedArgs
def __call__(self, task_func):
"""
calls func in task with the same name as the class
"""
# add task to main pipeline
# check for duplicate tasks inside _create_task
task = main_pipeline._create_task(task_func) So we are talking about Pipeline should be configurable at some step, either def __call__(self, task_func, pipeline = None):
"""
calls func in task with the same name as the class
"""
if pipeline is None:
pipeline = main_pipeline
# add task to main pipeline
# check for duplicate tasks inside _create_task
task = pipeline._create_task(task_func) Or def __init__(self, pipeline= None, *decoratorArgs, **decoratorNamedArgs):
"""
saves decorator arguments
"""
self.args = decoratorArgs
self.named_args = decoratorNamedArgs
if pipeline is None:
pipeline = main_pipeline
self.pipeline = pipeline
def __call__(self, task_func,):
"""
calls func in task with the same name as the class
"""
# add task to main pipeline
# check for duplicate tasks inside _create_task
task = self.pipeline._create_task(task_func) |
This is almost certainly an incompatibility of the two modes of ruffus - the functional paradigm and the object oriented one. Certainly in the functional paradigm (which is the one that I think most people use), there is no concept of main and non-main pipelines - a pipeline is a file. I don't know how many people use the object oriented interface, but I think part of the reason for it was to allow self contained sub pipelines. If we wanted to allow this we should use the second format, because the once decorated, functions are never directly called. However, having the pipeline as the first arguement would break all existing pipelines, so I suggest: def __init__(self, *decoratorArgs, **decoratorNamedArgs):
"""
saves decorator arguments
"""
self.args = decoratorArgs
self.named_args = decoratorNamedArgs
if pipeline in decoratorNameArgs:
self.pipeline = decoratorNamedArgs['pipeline']
else:
self.pipeline = main_pipeline
def __call__(self, task_func,):
"""
calls func in task with the same name as the class
"""
# add task to main pipeline
# check for duplicate tasks inside _create_task
task = self.pipeline._create_task(task_func) But it would need to be well documented somewhere. |
I am not very familiar with how ruffus supposed to be used in general, but I would quite like to realise the functionality in my toy example, bascially factoring out the "pipeline building" from the "job mapping" as two separate processes. My guess is that current architecture must have been adopted to simplify certain syntax that was commonly re-used, but might quite unfriendly to maintainer at my first glance... Apology if my guess is too wild here. In terms of code, your proposal would trigger def __init__(self, *decoratorArgs, **decoratorNamedArgs):
"""
saves decorator arguments
"""
self.args = decoratorArgs
self.named_args = decoratorNamedArgs
self.pipeline= decoratorNameArgs.get('pipeline', main_pipeline)
def __call__(self, task_func,):
"""
calls func in task with the same name as the class
"""
# add task to main pipeline
# check for duplicate tasks inside _create_task
task = self.pipeline._create_task(task_func) |
I'm not really sure about "supposed" to be used, only how i've seen people use it. Must people I know that use ruffus, would use the syntax in If we are ever keen to seperate business logic from pipeline flow we do as in Personally I think this syntax is less abstract, cleaner, more readable and more straight forwardly pythonic, without the nested function calls etc. For me this concreteness, and pure python approach is why I prefer ruffus to snakemake, but I admit this is purely a matter of personal taste I see no reason not to enable separate pipelines to be specified, although I didn't think it was necessary to what you wanted to do. Mind you, my mind has always shruken away whenever I've tried to look too hard under the hood of ruffus. The code I posted shouldn't throw |
Dear Ian,
I was only trying to point out that you forgot to wrap your “pipeline” in the proposal so that intepreter would be looking up the variable rather checking the “in” statement by this is kind of a minor, uninteresting point. Also note that you have a typo in the second decorator in “separate_logic.py”
Thanks for your thoughtful reply. I totally agreed that the choice of syntax would be more of a taste than anything. I am struggling a bit to see how separate_logic.py separates the logic: the data flow through functions is still defined with decorators and we are just creating an extra layer with the same signature here. It’s true that we can save run_bwa() in a separate file though and we could call multiple functions within a transformed function. But then the suffix signature would not be associated with the inner run_bwa() call but with the outer call only. Surely we want to save the suffix pattern with the reused pipeline (or not?).
I guess the ultimate reason for my reluctance is due to the decorator really, meaning I’m just not very comfortable to see something hanging above my function.... But once I started thinking about it, it actually makes some sense. Decorators here are actually not FOP and that was why I am confused. The decorator is defining a literal transformation in the data, whereas function defines the contextual change. Decorators dictate how names change whereas functions define how data change. It might be simpler to expand two changes into individual functions rather than confining name changes with a pre-designed syntax. But this means ruffus is trying to provide a grammar on how name changes should be implemented.
I am curious on how you would write a pipeline step that requires two argument rather than one though.
…Sent from my iPhone
On 25 Mar 2019, at 22:37, Ian Sudbery <[email protected]<mailto:[email protected]>> wrote:
I am not very familiar with how ruffus supposed to be used in general,
I'm not really sure about "supposed" to be used, only how i've seen people use it. Must people I know that use ruffus, would use the syntax in simple.py here<https://gist.github.com/IanSudbery/cc274d97d38fd88df34ca392b7a0d859#file-simple-py>.
If we are ever keen to seperate business logic from pipeline flow we do as in seperate_logic.py [here[(https://gist.github.com/IanSudbery/cc274d97d38fd88df34ca392b7a0d859#file-seperate_logic-py).
Personally I think this syntax is less abstract, cleaner, more readable and more straight forwardly pythonic, without the nested function calls etc. For me this concreteness, and pure python approach is why I prefer ruffus to snakemake, but I admit this is purely a matter of personal taste
I see no reason not to enable separate pipelines to be specified, although I didn't think it was necessary to what you wanted to do. Mind you, my mind has always shruken away whenever I've tried to look too hard under the hood of ruffus.
The code I posted shouldn't throw NameError because it checks for it first, but i agree your suggestion is equivalent but cleaner.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#111 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AHqXbCsSeEdQvGQltKNlhRVrymUrhbDMks5vaU-igaJpZM4b_ViU>.
|
There is no reason to hard code the Pipeline to be main_pipeline. Can we please fix this?
ruffus/ruffus/task.py
Line 344 in 67d2f12
The text was updated successfully, but these errors were encountered: