Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider Putting output before input #46

Open
tbooth opened this issue Sep 21, 2023 · 1 comment
Open

Reconsider Putting output before input #46

tbooth opened this issue Sep 21, 2023 · 1 comment
Labels
reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17

Comments

@tbooth
Copy link
Collaborator

tbooth commented Sep 21, 2023

From @jdblischak

You have the learners write the output field before the input field. And your motivation is that it is natural to work backwards when writing a Snakefile, eg:

Rather than listing steps in order of execution, you are always working backwards from the final desired result. The order of operations is determined by applying the pattern matching rules to the filenames, not by the order of the rules in the Snakefile.

This logic of working backwards from the desired output is why we’re putting the output lines first in all our rules - to remind us that these are what Snakemake looks at first!

I am not a fan of this approach for two main reasons:

Pretty much any other Snakefile they encounter or tutorial they read will list input before output. As a concrete example, the official Snakemake tutorial. Having them write their Snakefiles different from everyone else adds unnecessary cognitive load
While it's true that Snakemake works backwards just like Make does, and it's important for learners to understand this mental model, I don't think it is necessary for a Snakemake user to design their pipeline backwards. I always develop my Snakemake pipelines one rule at a time, in the forward order. While I have a vague sense of my final result, there are too many unknowns along the way. Inevitably I'll run into something frustrating like mismatched chromosomes between my sequencing files and the references files, and have to add a rule to fix this. In other words, I've never been able to follow your first step to "Define rules for all the processing steps". And even your lesson goes in the forward order, starting with trimming and counting before then adding rules for indexing and mapping
So like I said above, I don't think you need to change your lesson. But I would recommended adding some boxes, eg:

box: We recommend listing output before input to remind yourself how Snakemake processes the rules, but note that this is our personal preference. Most other Snakefiles you see will list input first
box: You can also build your pipeline one step at a time in the forward direction. Just make sure to always keep in mind that Snakemake processes the rules backwards

@tbooth tbooth added the reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17 label Sep 21, 2023
@tbooth
Copy link
Collaborator Author

tbooth commented Jul 17, 2024

It seems nobody likes my output/input/shell ordering. Per @cmeesters:

putting output before input is syntactically correct, but violates the de-facto standard
we use for workflows. It should not be introduced as good practice.

Lots of things that are common practise are bad practise, but it seems that I'm overruled here. I guess I'll look to switch the code around. Going back to the comments from @jdblischak, there's a conflation here between the order in which rules are added in the workflow design and the order of output/input/shell within a single rule. I completely agree that Snakemake users should not be expected to "design their pipeline backwards" but this has nothing to do with how a rule is written. Snakemake DOES evaluate rules in the order of output, then input, then shell and this is not a design choice it's a simple fact about how Snakemake works. I've recently answered a bunch of Snakemake questions on Stackoverflow and almost half of them are people who are struggling because they have not grasped this fundamental idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17
Projects
None yet
Development

No branches or pull requests

1 participant