diff --git a/01-introduction.html b/01-introduction.html index fcbff56..06a5064 100644 --- a/01-introduction.html +++ b/01-introduction.html @@ -569,7 +569,7 @@
This is such a useful thing we don’t know why it isn’t the default! @@ -624,7 +624,7 @@
rule countreads:
output: "ref1_1.fq.count"
@@ -657,7 +657,7 @@ Counting sequences in Snakemake
-
+
You can choose whatever name you like for this second rule, but it
can’t be “countreads” as rule names need to be unique within a
diff --git a/02-placeholders.html b/02-placeholders.html
index 02fb1e1..eda7088 100644
--- a/02-placeholders.html
+++ b/02-placeholders.html
@@ -534,7 +534,7 @@
Running the general-purpose rule
-
+
After editing the file, run the commands:
@@ -574,7 +574,7 @@ Choosing the right wildcards
-
+
In all cases, there is no need to change the shell
part
of the rule at all.
@@ -702,7 +702,7 @@ BASH
Show me the solution
-
+
# Trim any FASTQ reads for base quality
rule trimreads:
diff --git a/05-the_dag.html b/05-the_dag.html
index e6cb418..ffdad78 100644
--- a/05-the_dag.html
+++ b/05-the_dag.html
@@ -490,7 +490,7 @@ How many jobs?
-
+
10 in total: 3 * kallisto_quant // 6 * trimreads // 1 *
kallisto_index // 0 * countreads
@@ -693,7 +693,7 @@ Visualising the effect of the -R
-
+
This is a way to make the Kallisto result in the first place:
diff --git a/06-expansion.html b/06-expansion.html
index dec97b5..a469d81 100644
--- a/06-expansion.html
+++ b/06-expansion.html
@@ -566,7 +566,7 @@ Counting all the reads
-
+
This will work.
# Input conditions and replicates to process
@@ -651,7 +651,7 @@ Combining the inputs of the
-
+
rule all_counts:
input:
@@ -771,7 +771,7 @@ ‘Globbing’ the list of samples
-
+
PYTHON
@@ -807,7 +807,7 @@ ‘Globbing’ the list of samples
-
+
PYTHON
diff --git a/07-awkward_programs.html b/07-awkward_programs.html
index a44ee6d..bc1d875 100644
--- a/07-awkward_programs.html
+++ b/07-awkward_programs.html
@@ -530,7 +530,7 @@ Adding a FastQC rule using the default output file nam
-
+
Since the shell
command is not to be changed, the output
names will be dictated by FastQC as we saw when running the command
@@ -586,7 +586,7 @@
OUTPUT<
-
+
This involves using the {myfile} wildcard twice and then constructing
the output directory name to place in the -o
option to
@@ -719,7 +719,7 @@
Fixing FastQC to use our own output file names
-
+
This is one solution, using -o .
to tell FastQC to
produce the files in the current directory, then explicitly renaming
diff --git a/10-performance.html b/10-performance.html
index baacdb5..e6b6f3f 100644
--- a/10-performance.html
+++ b/10-performance.html
@@ -516,7 +516,7 @@
Measuring how concurrency affects execution time
-
+
The time will vary depending on the system configuration but
somewhere around 30 seconds is expected, and this should reduce to
@@ -594,7 +594,7 @@
Getting other programs to run with multiple threads
Show me the solution
-
+
For salmon_quant, -p {threads}
or equivalently
--threads {threads}
will work.
diff --git a/12-assembly_challenge.html b/12-assembly_challenge.html
index e5e328c..78a62dd 100644
--- a/12-assembly_challenge.html
+++ b/12-assembly_challenge.html
@@ -517,7 +517,7 @@ Running the Bash script
-
+
The commands to build and activate the Conda env can be found in the
“Conda Integration” chapter.
@@ -646,7 +646,7 @@ Challenge - building a full Snakemake workflow
-
+
A sample solution to this exercise is available here, along
with a suitable Conda
diff --git a/13-cleaning_up.html b/13-cleaning_up.html
index e248e81..a4b3ce8 100644
--- a/13-cleaning_up.html
+++ b/13-cleaning_up.html
@@ -411,7 +411,8 @@ Overview
Questions
- How do I save disk space by removing temporary files?
-- How do I protect important outputs from deletion?
+- How do I isolate the interim files created by jobs from each
+other?
@@ -419,7 +420,7 @@ Questions
Objectives
-- Understand the function of temporary outputs.
+- Understand how Snakemake manages temporary outputs
- Learn about running in --touch mode
- Learn about shadow mode rules
@@ -598,6 +599,8 @@ Protecting specific files
says that once an output has been produced it must not be overwritten.
In practise, Snakemake does this by revoking write permissions on the
files (as in chmod -w {output}
).
+We can do this for the Salmon and Kallisto indexes, for example, as
+these should only ever need to be generated once.
This works, but can be annoying because Snakemake will refuse to run
if it believes it needs to re-create a file which is protected. An
alternative suggestion is, once you have generated an important output
@@ -700,6 +703,7 @@
Key Points
- Cleaning up working files is good practise
- Make use of the
temporary()
function on outputs you
don’t need to keep
+- Protect outputs which are expensive to reproduce
- Shadow rules can solve issues with commands that produce unwanted
files
diff --git a/B1-quoting.html b/B1-quoting.html
index c81a59a..7047eb1 100644
--- a/B1-quoting.html
+++ b/B1-quoting.html
@@ -527,7 +527,7 @@ Adding a lenreads rule
-
+
It won’t work. Snakemake assumes that all parts of the string in
{curlies} are placeholders. The error will say something like
@@ -610,7 +610,7 @@
BASH
Show me the solution
-
+
This just involves adding :q
to a whole bunch of
placeholders. Unless you are very diligent it will probably take a few
diff --git a/aio.html b/aio.html
index 75270d6..daff36c 100644
--- a/aio.html
+++ b/aio.html
@@ -646,7 +646,7 @@
Running Snakemake
-
+
- Prints the shell commands that are being run to the terminal
@@ -707,7 +707,7 @@ Counting sequences in Snakemake
-
+
rule countreads:
output: "ref1_1.fq.count"
@@ -740,7 +740,7 @@ Counting sequences in Snakemake
-
+
You can choose whatever name you like for this second rule, but it
can’t be “countreads” as rule names need to be unique within a
@@ -931,7 +931,7 @@
Running the general-purpose rule
-
+
After editing the file, run the commands:
@@ -973,7 +973,7 @@ Choosing the right wildcards
-
+
In all cases, there is no need to change the shell
part
of the rule at all.
@@ -1118,7 +1118,7 @@ BASH
Show me the solution
-
+
# Trim any FASTQ reads for base quality
rule trimreads:
@@ -2021,7 +2021,7 @@ How many jobs?
-
+
10 in total: 3 * kallisto_quant // 6 * trimreads // 1 *
kallisto_index // 0 * countreads
@@ -2236,7 +2236,7 @@ Visualising the effect of the -R
-
+
This is a way to make the Kallisto result in the first place:
@@ -2503,7 +2503,7 @@ Counting all the reads
-
+
This will work.
# Input conditions and replicates to process
@@ -2592,7 +2592,7 @@ Combining the inputs of the
-
+
rule all_counts:
input:
@@ -2714,7 +2714,7 @@ ‘Globbing’ the list of samples
-
+
PYTHON
@@ -2750,7 +2750,7 @@ ‘Globbing’ the list of samples
-
+
PYTHON
@@ -2992,7 +2992,7 @@ Adding a FastQC rule using the default output file nam
-
+
Since the shell
command is not to be changed, the output
names will be dictated by FastQC as we saw when running the command
@@ -3049,7 +3049,7 @@
OUTPUT<
-
+
This involves using the {myfile} wildcard twice and then constructing
the output directory name to place in the -o
option to
@@ -3184,7 +3184,7 @@
Fixing FastQC to use our own output file names
-
+
This is one solution, using -o .
to tell FastQC to
produce the files in the current directory, then explicitly renaming
@@ -4149,7 +4149,7 @@
Measuring how concurrency affects execution time
-
+
The time will vary depending on the system configuration but
somewhere around 30 seconds is expected, and this should reduce to
@@ -4234,7 +4234,7 @@
Getting other programs to run with multiple threads
Show me the solution
-
+
For salmon_quant, -p {threads}
or equivalently
--threads {threads}
will work.
@@ -4877,7 +4877,7 @@ Running the Bash script
-
+
The commands to build and activate the Conda env can be found in the
“Conda Integration” chapter.
@@ -5030,7 +5030,7 @@ Challenge - building a full Snakemake workflow
-
+
@@ -5088,7 +5089,7 @@ Questions
Objectives
-- Understand the function of temporary outputs.
+- Understand how Snakemake manages temporary outputs
- Learn about running in --touch mode
- Learn about shadow mode rules
@@ -5272,6 +5273,8 @@ Protecting specific files
says that once an output has been produced it must not be overwritten.
In practise, Snakemake does this by revoking write permissions on the
files (as in chmod -w {output}
).
+We can do this for the Salmon and Kallisto indexes, for example, as
+these should only ever need to be generated once.
This works, but can be annoying because Snakemake will refuse to run
if it believes it needs to re-create a file which is protected. An
alternative suggestion is, once you have generated an important output
@@ -5381,6 +5384,7 @@
Key Points
- Cleaning up working files is good practise
- Make use of the
temporary()
function on outputs you
don’t need to keep
+- Protect outputs which are expensive to reproduce
- Shadow rules can solve issues with commands that produce unwanted
files
diff --git a/files/ep13.Snakefile b/files/ep13.Snakefile
index 4059f48..bdb45e8 100644
--- a/files/ep13.Snakefile
+++ b/files/ep13.Snakefile
@@ -43,7 +43,7 @@ rule kallisto_quant:
rule kallisto_index:
output:
- idx = "{strain}.kallisto_index",
+ idx = protected("{strain}.kallisto_index"),
input:
fasta = "transcriptome/{strain}.cdna.all.fa.gz",
log: "{strain}.kallisto_log"
diff --git a/instructor/01-introduction.html b/instructor/01-introduction.html
index 24da416..ecee437 100644
--- a/instructor/01-introduction.html
+++ b/instructor/01-introduction.html
@@ -551,7 +551,7 @@ BASH
Use of the -F flag
-
+
In the first few episodes we always run Snakemake with the
-F
flag, and it’s not explained what this does until Ep.
@@ -594,7 +594,7 @@
Running Snakemake
-
+
- Prints the shell commands that are being run to the terminal
This is such a useful thing we don’t know why it isn’t the default!
@@ -631,7 +631,7 @@
OUTPUT<
-
+
A command is presented to count the sequences in a FASTQ file:
$ echo $(( $(wc -l <file.fq) / 4 ))
@@ -666,7 +666,7 @@ Counting sequences in Snakemake
-
+
rule countreads:
output: "ref1_1.fq.count"
@@ -699,7 +699,7 @@ Counting sequences in Snakemake
-
+
You can choose whatever name you like for this second rule, but it
can’t be “countreads” as rule names need to be unique within a
diff --git a/instructor/02-placeholders.html b/instructor/02-placeholders.html
index 37994be..f2a3708 100644
--- a/instructor/02-placeholders.html
+++ b/instructor/02-placeholders.html
@@ -536,7 +536,7 @@
Running the general-purpose rule
-
+
After editing the file, run the commands:
@@ -576,7 +576,7 @@ Choosing the right wildcards
-
+
In all cases, there is no need to change the shell
part
of the rule at all.
@@ -704,7 +704,7 @@ BASH
Show me the solution
-
+
# Trim any FASTQ reads for base quality
rule trimreads:
diff --git a/instructor/05-the_dag.html b/instructor/05-the_dag.html
index e166e8b..b7c677f 100644
--- a/instructor/05-the_dag.html
+++ b/instructor/05-the_dag.html
@@ -492,7 +492,7 @@ How many jobs?
-
+
10 in total: 3 * kallisto_quant // 6 * trimreads // 1 *
kallisto_index // 0 * countreads
@@ -695,7 +695,7 @@ Visualising the effect of the -R
-
+
This is a way to make the Kallisto result in the first place:
diff --git a/instructor/06-expansion.html b/instructor/06-expansion.html
index bc20da2..f649ab2 100644
--- a/instructor/06-expansion.html
+++ b/instructor/06-expansion.html
@@ -568,7 +568,7 @@ Counting all the reads
-
+
This will work.
# Input conditions and replicates to process
@@ -653,7 +653,7 @@ Combining the inputs of the
-
+
rule all_counts:
input:
@@ -773,7 +773,7 @@ ‘Globbing’ the list of samples
-
+
PYTHON
@@ -809,7 +809,7 @@ ‘Globbing’ the list of samples
-
+
PYTHON
diff --git a/instructor/07-awkward_programs.html b/instructor/07-awkward_programs.html
index 1107af4..c04f858 100644
--- a/instructor/07-awkward_programs.html
+++ b/instructor/07-awkward_programs.html
@@ -532,7 +532,7 @@ Adding a FastQC rule using the default output file nam
-
+
Since the shell
command is not to be changed, the output
names will be dictated by FastQC as we saw when running the command
@@ -588,7 +588,7 @@
OUTPUT<
-
+
This involves using the {myfile} wildcard twice and then constructing
the output directory name to place in the -o
option to
@@ -721,7 +721,7 @@
Fixing FastQC to use our own output file names
-
+
This is one solution, using -o .
to tell FastQC to
produce the files in the current directory, then explicitly renaming
diff --git a/instructor/10-performance.html b/instructor/10-performance.html
index 9b60055..c338ef1 100644
--- a/instructor/10-performance.html
+++ b/instructor/10-performance.html
@@ -518,7 +518,7 @@
Measuring how concurrency affects execution time
-
+
The time will vary depending on the system configuration but
somewhere around 30 seconds is expected, and this should reduce to
@@ -596,7 +596,7 @@
Getting other programs to run with multiple threads
Show me the solution
-
+
For