Skip to content

Commit

Permalink
Justify use of -F flag
Browse files Browse the repository at this point in the history
Add some ideas on how to introduce Snakemake
Fix some formatting
  • Loading branch information
tbooth committed Jul 22, 2024
1 parent b8661bf commit 9e9f9c4
Show file tree
Hide file tree
Showing 4 changed files with 52 additions and 14 deletions.
7 changes: 5 additions & 2 deletions episodes/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ For now we'll just look at one single file, `ref1_1.fq`.
In the terminal:

```bash
$ cd yeast
$ cd snakemake_data/yeast
$ ls reads

$ head -n8 reads/ref1_1.fq
Expand Down Expand Up @@ -121,6 +121,9 @@ indents, etc. we may see an error.
$ snakemake -j1 -F -p ref1_1.fq.count
```

For these early examples, we'll always run Snakemake with the `-j1`, `-F` and `-p` options. Later
we'll look more deeply at these and other available command-line options to Snakemake.

::::::::::::::::::::::::::::::::::::::: challenge

## Running Snakemake
Expand All @@ -143,7 +146,7 @@ What does the `-p` option in the `snakemake` command above do?

This is such a useful thing we don't know why it isn't the default! The `-j1` option is what
tells Snakemake to only run one process at a time, and we'll stick with this for now as it
makes things simpler. The `-F` option tells Snakemake to always overwrite output files, and
makes things simpler. The `-F` option tells Snakemake to always recreate output files, and
we'll learn about protected outputs much later in the course. Answer 4 is a total red-herring,
as Snakemake never prompts interactively for user input.

Expand Down
4 changes: 2 additions & 2 deletions episodes/09-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,11 +225,11 @@ A this point in the course there may be a cluster demo...

::::::::::::::::::::::::::::::::::::::::::::::::::

{% comment %}
[comment]: # (
Photo credit: Cskiran
Sourced from Wikimedia Commons
CC-BY-SA-4.0
{% endcomment %}
)

*For reference, [this is a Snakefile](files/ep09.Snakefile) incorporating the changes made in
this episode.*
Expand Down
7 changes: 5 additions & 2 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@ In the planning phase of writing this course material we outlined some [learner

:::::::::::::::::::::::::::::::::::::::::: prereq

## Prerequisites
## Learner Prerequisites

See the [prerequisites](prereqs.html) page for a full list of skills and concepts we assume that
learners will know prior to taking this lesson. In brief:

This is an intermediate lesson and assumes learners have some prior experience in bioinformatics:

Expand All @@ -35,7 +38,7 @@ This is an intermediate lesson and assumes learners have some prior experience i
- Knowing about bioinformatics fundamentals like the [FASTQ file format
](https://en.wikipedia.org/wiki/FASTQ_format) and [read mapping
](https://en.wikipedia.org/wiki/Read_\(biology\)#NGS_and_read_mapping),
in order to understand the example workflow.
in order to understand the example workflows.

No previous knowledge of Snakemake or workflow systems, or Python programming, is assumed.

Expand Down
48 changes: 40 additions & 8 deletions instructors/instructor-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,17 @@ Prior to beginning the first lesson you want to say something about Snakemake. A
saying how you yourself came across Snakemake and how you use it in your own work is probably
the best approach.

Otherwise, the info on [https://snakemake.readthedocs.io](https://snakemake.readthedocs.io) should have everything you need, and the
"rolling paper" has a nice graphic showing the history of the Snakemake project.
Otherwise, the info on [https://snakemake.readthedocs.io](https://snakemake.readthedocs.io) should
have everything you need, and the [rolling paper](https://f1000research.com/articles/10-33/v2)
has a nice graphic (fig. 2) showing the history of the Snakemake project. As of July 2024 this
paper has over 1000 citations.

## When to use a workflow system?

Learners may ask when it is appropriate to use a system like Snakemake.

The paper [Workflow systems turn raw data into scientific knowledge](https://pubmed.ncbi.nlm.nih.gov/31477884/)
has a view on this:
The paper [Workflow systems turn raw data into scientific knowledge](
https://pubmed.ncbi.nlm.nih.gov/31477884/) has a view on this:

:::::::::::::::::::::::::::::::::::::: discussion

Expand All @@ -35,17 +37,22 @@ defined in one or two rules. Once you understand the fundamentals you are likely
Snakemake for even these simple tasks.

Having said this, not every data analysis task is suited to Snakemake, or in some cases you may
only want to use Snakemake for part of a task, and do the rest with regular scripting.
only want to use Snakemake for part of a task, and do the rest with, say, regular scripting.

## Which is the best workflow system to use?

Snakemake 🐍
Snakemake! 🐍

But, in seriousness, other workflow systems are available. Some are better suited to different
tasks, and some users have a preference for one over another. For a large task, it is worth
investigating multiple options before committing to an approach.
[This GIT repository and associated paper](https://github.com/GoekeLab/bioinformatics-workflows)
comparing eight workflow systems is a good place to start.
comparing eight workflow systems is a good place to start. And in fact the previously mentioned
[Snakemake rolling paper](https://f1000research.com/articles/10-33/v2) compares Snakemake to
several other workflow systems.

You should also look through existing workflows on resources like [WorkflowHub](
https://workflowhub.eu), as someone may have already solved all or part of your problem.

## About the sample data files

Expand All @@ -58,14 +65,39 @@ really add anything to the course.
It's possible that a learner will accidentally delete or overwite the input files. In this case,
note that a copy is available to download - see the link on [the setup page](../learners/setup.md).

## Choice of bioinformatics software

Like the toy dataset, the tools in this course are chosen to illustrate the workings of Snakemake.
The choice of older and simpler tools like *fastx toolkit* is deliberate, and reduces the burden of
maintenance of this course material as tools are updated.

In practise, learners may ask to go into more depth on the choice, configuration, and functionality
of the bioinformatics software. If you have the time and are confident talking about this then do
so, but if not then it is valid to reiterate that the focus of the course is on the orchestration
of analysis steps with Snakemake, not the choice of what software is best for any given analysis.

# Notes on specific episodes

## Episode 01 - Running commands with Snakemake

In the first few episodes we always run Snakemake with the `-F` flag, and it's not explained what
this does until Ep. 04. The rationale is that the default Snakemake behaviour when pruning the DAG
leads to learners seeing different output (typically the message "nothing to be done") when
repeating the exact same command. This can seem strange to learners who are used to scripting and
imperative programming.

The internal rules used by Snakemake to determine which jobs in the DAG are to be run, and which
skipped, are pretty complex, but the behaviour seen under `-F` is much more simple and consistent;
Snakemake simply runs every job in the DAG every time. You can think of `-F` as disabling the lazy
evaluation feature of Snakemake, until we are ready to properly introduce and understand it.

## Episode 03 - Chaining rules

There is a figure to illustrate the way Snakemake finds rules by wildcard matching and then tracks
back until it runs out of rule matches and finds a file that it already has. You may find that
showing an animated version of this is helpful, in which case
[there are some slides here](https://github.com/carpentries-incubator/snakemake-novice-bioinformatics/files/9299078/wildcard_demo.pptx).
[there are some slides here](
https://github.com/carpentries-incubator/snakemake-novice-bioinformatics/files/9299078/wildcard_demo.pptx).



Expand Down

0 comments on commit 9e9f9c4

Please sign in to comment.