Update ChIP-Seq tutorial with latest tool versions and links

galaxyproject · Jul 17, 2023 · 9875d80 · 9875d80
1 parent 3b9ae5e
commit 9875d80
Show file tree

Hide file tree

Showing 4 changed files with 2,000 additions and 3,679 deletions.
diff --git a/topics/epigenetics/images/formation_of_super-structures_on_xi/peak_heatmap_all.png b/topics/epigenetics/images/formation_of_super-structures_on_xi/peak_heatmap_all.png
diff --git a/topics/epigenetics/tutorials/formation_of_super-structures_on_xi/tutorial.md b/topics/epigenetics/tutorials/formation_of_super-structures_on_xi/tutorial.md
@@ -30,6 +30,7 @@ contributors:
     - vivekbhr
     - fidelram
     - LeilyR
+    - pavanvidem
 ---
 
 # Introduction
@@ -126,8 +127,8 @@ During sequencing, errors are introduced, such as incorrect nucleotides being ca
 Sequence quality control is therefore an essential first step in your analysis. We use here similar tools as described in ["Quality control" tutorial]({{site.baseurl}}/topics/sequence-analysis): [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and [Trim Galore](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/).
 
 > <hands-on-title>Quality control</hands-on-title>
->
-> 1. Run **FastQC** {% icon tool %} with the following parameters
+> 
+> 1. Run {% tool [FastQC](toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.73+galaxy0) %} with the following parameters:
 >    - {% icon param-files %} *"Short read data from your current history"*: `wt_H3K4me3_read1` and `wt_H3K4me3_read2` (Input datasets selected with **Multiple datasets**)
 >
 >    {% snippet faqs/galaxy/tools_select_multiple_datasets.md %}
@@ -193,7 +194,7 @@ It is often necessary to trim sequenced read, for example, to get rid of bases t
 
 > <hands-on-title>Trimming low quality bases</hands-on-title>
 >
-> 1. Run **Trim Galore!** {% icon tool %} with the following parameters
+> 1. Run {% tool [Trim Galore!](toolshed.g2.bx.psu.edu/repos/bgruening/trim_galore/trim_galore/0.6.7+galaxy0) %} with the following parameters:
 >    - *"Is this library paired- or single-end?"*: `Paired-end`
 >       - {% icon param-file %} *"Reads in FASTQ format"*: `wt_H3K4me3_read1` (Input dataset)
 >       - {% icon param-file %} *"Reads in FASTQ format"*: `wt_H3K4me3_read2` (Input dataset)
@@ -241,7 +242,7 @@ With ChiP sequencing, we obtain sequences corresponding to a portion of DNA link
 
 > <hands-on-title>Mapping</hands-on-title>
 >
-> 1. **Bowtie2** {% icon tool %} with
+> 1. {% tool [Bowtie2](toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.5.0+galaxy0) %} with the following parameters:
 >    - *"Is this single or paired library"*: `Paired-end`
 >        - {% icon param-file %} *"FASTA/Q file #1"*: `trimmed reads pair 1` (output of **Trim Galore!** {% icon tool %})
 >        - {% icon param-file %} *"FASTA/Q file #2"*: `trimmed reads pair 2` (output of **Trim Galore!** {% icon tool %})
@@ -256,9 +257,9 @@ With ChiP sequencing, we obtain sequences corresponding to a portion of DNA link
 >    > How many reads where mapped? Uniquely or several times?
 >    >
 >    > > <solution-title></solution-title>
->    > > The overall alignment rate is 98.64%. This score is quite high. If you have less than 70-80%, you should investigate the cause: contamination, etc.
+>    > > The overall alignment rate is 98.57%. This score is quite high. If you have less than 70-80%, you should investigate the cause: contamination, etc.
 >    > >
->    > > 43719 (90.27%) reads have been aligned concordantly exactly 1 time and 3340 (6.90%) aligned concordantly >1 times. The latter ones correspond to multiple mapped reads. Allowing for multiple  mapped reads increases the number of usable reads and the sensitivity of peak detection;
+>    > > 41514 (85.72%) reads have been aligned concordantly exactly 1 time and 5190 (10.72%) aligned concordantly >1 times. The latter ones correspond to multiple mapped reads. Allowing for multiple  mapped reads increases the number of usable reads and the sensitivity of peak detection;
 >    > > however, the number of false positives may also increase.
 >    > {: .solution }
 >    {: .question}
@@ -306,7 +307,7 @@ Since in this tutorial we are interested in assessing H3K4me3, H3K27me3 and CTCF
 >    ```
 >
 > 3. Rename the files
-> 3. **multiBamSummary** {% icon tool %} with the following parameters
+> 3. {% tool [multiBamSummary](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_multi_bam_summary/deeptools_multi_bam_summary/3.5.1.0.0) %} with the following parameters:
 >    - *"Sample order matters"*: `No`
 >       - {% icon param-files %} *"BAM/CRAM file"*: the 8 imported BAM files
 >    - *"Choose computation mode"*: `Bins`
@@ -322,7 +323,7 @@ Since in this tutorial we are interested in assessing H3K4me3, H3K27me3 and CTCF
 >
 >    Using these parameters, the tool will take bins of 1000 bp separated by 500 bp on the chromosome X. For each bin the overlapping reads in each sample will be computed and stored into a matrix.
 >
-> 4. **plotCorrelation** {% icon tool %} with the following parameters
+> 4. {% tool [plotCorrelation](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_plot_correlation/deeptools_plot_correlation/3.5.1.0.0) %} with the following parameters:
 >    - {% icon param-files %} *"Matrix file from the multiBamSummary tool"*: `correlation matrix`(output of **multiBamSummary** {% icon tool %})
 >    - *"Correlation method"*: `Pearson`
 >
@@ -354,7 +355,7 @@ Similar to **multiBamSummary** {% icon tool %}, **plotFingerprint** {% icon tool
 
 > <hands-on-title>IP strength estimation</hands-on-title>
 >
-> 1. **plotFingerprint** {% icon tool %} with the following parameters
+> 1. {% tool [plotFingerprint](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_plot_fingerprint/deeptools_plot_fingerprint/3.5.1.0.0) %} with the following parameters:
 >    - *"Sample order matters"*: `No`
 >       - {% icon param-files %} *"BAM/CRAM file"*: `wt_input_rep1` and `wt_H3K4me3_rep1`
 >    - *"Region of the genome to limit the operation to"*: `chrX`
@@ -401,12 +402,12 @@ To learn how to do the normalization, we will take the `wt_H3K4me3_rep1` sample
 
 > <hands-on-title>Estimation of the sequencing depth</hands-on-title>
 >
-> 1. **IdxStats** {% icon tool %} with the following parameters
+> 1. {% tool [Samtools idxstats](toolshed.g2.bx.psu.edu/repos/devteam/samtools_idxstats/samtools_idxstats/2.0.4) %} with the following parameters:
 >    - {% icon param-files %} *"BAM file"*: `wt_H3K4me3_rep1.bam` and `wt_input_rep1.bam`
 >
 > > <question-title></question-title>
 > >
-> > 1. What is the output of **IdxStats** {% icon tool %}?
+> > 1. What is the output of **Samtools idxstats** {% icon tool %}?
 > > 2. How many reads has been mapped on chrX for the input and for the ChIP-seq samples?
 > > 3. Why are the number of reads different? And what could be the impact of this difference?
 > >
@@ -424,7 +425,7 @@ We are using **bamCoverage** {% icon tool %}. Given a BAM file, this tool genera
 
 > <hands-on-title>Coverage file normalization</hands-on-title>
 >
-> 1. **bamCoverage** {% icon tool %} with the following parameters
+> 1. {% tool [bamCoverage](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_coverage/deeptools_bam_coverage/3.5.1.0.0) %} with the following parameters:
 >    - {% icon param-files %} *"BAM file"*: `wt_H3K4me3_rep1.bam` and `wt_input_rep1.bam`
 >    - *"Bin size in bases"*: `25`
 >    - *"Scaling/Normalization method"*: `Normalize coverage to 1x`
@@ -443,7 +444,7 @@ We are using **bamCoverage** {% icon tool %}. Given a BAM file, this tool genera
 >    > {: .solution }
 >    {: .question}
 >
-> 2. **bamCoverage** {% icon tool %} with the same parameters but
+> 2. {% tool [bamCoverage](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_coverage/deeptools_bam_coverage/3.5.1.0.0) %} with the same parameters but
 >    - *"Coverage file format"*: `bigWig`
 >
 >    > <question-title></question-title>
@@ -455,7 +456,7 @@ We are using **bamCoverage** {% icon tool %}. Given a BAM file, this tool genera
 >    > {: .solution }
 >    {: .question}
 >
-> 3. **IGV** {% icon tool %} to inspect both signal coverages (input and ChIP samples) in IGV
+> 3. Use **IGV** {% icon tool %} to inspect both signal coverages (input and ChIP samples).
 >
 {: .hands_on}
 
@@ -476,7 +477,7 @@ To extract only the information induced by the immunoprecipitation, we normalize
 
 > <hands-on-title>Generation of input-normalized coverage files</hands-on-title>
 >
-> 1. **bamCompare** {% icon tool %} with the following parameters
+> 1. {% tool [bamCompare](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_compare/deeptools_bam_compare/3.5.1.0.0) %} with the following parameters:
 >    - {% icon param-file %} *"First BAM file (e.g. treated sample)"*: `wt_H3K4me3_rep1.bam`
 >    - {% icon param-file %} *"Second BAM file (e.g. control sample)"*: `wt_input_rep1.bam`
 >    - *"Bin size in bases"*: `50`
@@ -495,10 +496,10 @@ To extract only the information induced by the immunoprecipitation, we normalize
 >    > {: .solution }
 >    {: .question}
 >
-> 2. **bamCompare** {% icon tool %} with the same parameters but:
+> 2. {% tool [bamCompare](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_compare/deeptools_bam_compare/3.5.1.0.0) %} with the same parameters but:
 >    - *"Coverage file format"*: `bigWig`
 >
-> 3. **IGV** {% icon tool %} to inspect the log2 ratio
+> 3. Use **IGV** {% icon tool %} to inspect the log2 ratio.
 >
 {: .hands_on}
 
@@ -519,7 +520,7 @@ We could see in the ChIP data some enriched regions (peaks). We now would like t
 
 > <hands-on-title>Peak calling</hands-on-title>
 >
-> 1. **MACS2 callpeak** {% icon tool %} with the following parameters
+> 1. {% tool [MACS2 callpeak](toolshed.g2.bx.psu.edu/repos/iuc/macs2/macs2_callpeak/2.2.7.1+galaxy0) %} with the following parameters:
 >    - *"Are you pooling Treatment Files?"*: `No`
 >       - {% icon param-file %} *"ChIP-Seq Treatment File"*: `wt_H3K4me3_rep1.bam`
 >    - *"Do you have a Control File?"*: `Yes`
@@ -561,7 +562,7 @@ We could see in the ChIP data some enriched regions (peaks). We now would like t
 >
 > > <solution-title></solution-title>
 > > 1. We can see 11 peaks (track below the genes).
-> > 2. Using **Filter** {% icon tool %} with `c2>151385260 and c3<152426526`, we found that the 11 peaks with fold changes between 3.81927 and 162.06572
+> > 2. Using {% tool [Filter](Filter1) %} with `c2>151385260 and c3<152426526`, we found that the 11 peaks with fold changes between 3.81927 and 162.06572
 > > 4. On the 656 peaks on the full chromosome (number of lines of the original BED file) there are 252 peaks with FC>50 (using **Filter** {% icon tool %} with `c7>50`)
 > {: .solution }
 {: .question}
@@ -576,15 +577,15 @@ Since we already generated the required files for the H3K4me3 sample, let's make
 
 > <hands-on-title>Prepare the peaks and data for CTCF</hands-on-title>
 >
-> 1. **bamCompare** {% icon tool %} with the following parameters
+> 1. {% tool [bamCompare](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_compare/deeptools_bam_compare/3.5.1.0.0) %} with the following parameters:
 >    - {% icon param-file %} *"First BAM file (e.g. treated sample)"*: `wt_CTCF_rep1.bam`
 >    - {% icon param-file %} *"Second BAM file (e.g. control sample)"*: `wt_input_rep1.bam`
 >    - *"Bin size in bases"*: `50`
 >    - *"How to compare the two files"*: `Compute log2 of the number of reads ratio`
 >    - *"Coverage file format"*: `bigwig`
 >    - *"Region of the genome to limit the operation to"*: `chrX`
 > 2. Rename the output of **bamCompare** {% icon tool %} with the name of the sample
-> 3. **MACS2 callpeak** {% icon tool %} with the following parameters
+> 3. {% tool [MACS2 callpeak](toolshed.g2.bx.psu.edu/repos/iuc/macs2/macs2_callpeak/2.2.7.1+galaxy0) %} with the following parameters
 >    - *"Are you pooling Treatment Files?"*: `No`
 >       - {% icon param-file %} *"ChIP-Seq Treatment File"*: `wt_CTCF_rep1.bam`
 >    - *"Do you have a Control File?"*: `Yes`
@@ -599,13 +600,14 @@ We can now concatenate the MACS2 outputs with the location of the peaks (concate
 
 > <hands-on-title>Prepare the peak coordinates</hands-on-title>
 >
-> 1. **Concatenate two datasets into one dataset** {% icon tool %} with the following parameters
->    - {% icon param-file %} *"Concatenate"*: output of **MACS2 callpeak** {% icon tool %} for `wt_CTCF_rep1`
->    - {% icon param-file %} *"with"*: output of **MACS2 callpeak** {% icon tool %} for `wt_H3K4me3_rep1`
-> 2. **SortBED** {% icon tool %} with the following parameters
->    - {% icon param-file %} *"Sort the following bed,bedgraph,gff,vcf file"*: output of **Concatenate** {% icon tool %}
-> 3. **MergeBED** {% icon tool %} with the following parameters
->    - {% icon param-file %} *"Sort the following bed,bedgraph,gff,vcf file"*: output of **SortBED** {% icon tool %}
+> 1. {% tool [Concatenate datasets](cat1) %} with the following parameters:
+>    - {% icon param-file %} *"Concatenate Dataset"*: output of **MACS2 callpeak** {% icon tool %} for `wt_CTCF_rep1`
+>    - Click on *"Insert Dataset"*:
+>       - For *"Dataset"*: select the output of **MACS2 callpeak** {% icon tool %} for `wt_H3K4me3_rep1`
+> 2. {% tool [bedtools SortBED](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_sortbed/2.30.0+galaxy2) %} with the following parameters:
+>    - {% icon param-file %} *"Sort the following BED/bedGraph/GFF/VCF/EncodePeak file"*: output of **Concatenate** {% icon tool %}
+> 3. {% tool [bedtools MergeBED](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_mergebed/2.30.0) %} with the following parameters:
+>    - {% icon param-file %} *"Sort the following BAM/BED/bedGraph/GFF/VCF/EncodePeak file"*: output of **SortBED** {% icon tool %}
 >
 {: .hands_on}
 
@@ -617,7 +619,7 @@ Optionally, we can also use **plotProfile** {% icon tool %} to create a profile
 
 > <hands-on-title>Plot the heatmap</hands-on-title>
 >
-> 1. **computeMatrix** {% icon tool %} with the following parameters:
+> 1. {% tool [computeMatrix](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_compute_matrix/deeptools_compute_matrix/3.5.1.0.0)%} with the following parameters:
 >    - *"Select regions"*:
 >       - {% icon param-file %} *"Regions to plot"*: output of **MergeBED** {% icon tool %}
 >    - *"Sample order matters"*: `No`
@@ -626,7 +628,7 @@ Optionally, we can also use **plotProfile** {% icon tool %} to create a profile
 >       - *"The reference point for the plotting"*: `center of region`
 >       - *"Distance upstream of the start site of the regions defined in the region file"*: `3000`
 >       - *"Distance downstream of the end site of the given regions"*: `3000`
-> 2. **plotHeatmap** {% icon tool %} with the following parameters
+> 2. {% tool [plotHeatmap](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_plot_heatmap/deeptools_plot_heatmap/3.5.1.0.1) %} with the following parameters
 >    - {% icon param-file %} *"Matrix file from the computeMatrix tool"*: `Matrix` (output of **computeMatrix** {% icon tool %})
 >    - *"Show advanced options"*: `yes`
 >       - *"Reference point label"*: select the right label
@@ -658,24 +660,25 @@ So far, we have only analyzed 2 samples, but we can do the same for all the 6 sa
 
 > <hands-on-title>(Optional) Plot the heatmap for all the samples</hands-on-title>
 >
-> 1. **bamCompare** {% icon tool %} for each combination input - ChIP data:
+> 1. {% tool [bamCompare](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_compare/deeptools_bam_compare/3.5.1.0.0) %} for each combination input - ChIP data:
 >     1. `wt_CTCF_rep1` - `wt_input_rep1` (already done)
 >     2. `wt_H3K4me3_rep1` - `wt_input_rep1` (already done)
 >     3. `wt_H3K27me3_rep1` - `wt_input_rep1`
 >     4. `wt_CTCF_rep2` - `wt_input_rep2`
 >     5. `wt_H3K4me3_rep2` - `wt_input_rep2`
 >     6. `wt_H3K27me3_rep2` - `wt_input_rep2`
 > 2. Rename the outputs of **bamCompare** {% icon tool %} with the name of the ChIP data
-> 3. **MACS2 callpeak** {% icon tool %} for each combination input - ChIP data
-> 4. **Concatenate datasets tail-to-head** {% icon tool %} with the following parameters
->     - {% icon param-file %} *"Concatenate Dataset"*: one output of **MACS2 callpeak** {% icon tool %}
->     - Click *"Insert Dataset"* and {% icon param-file %} *"Select"* one other output of **MACS2 callpeak** {% icon tool %}
->     - Redo for the 6 outputs of **MACS2 callpeak** {% icon tool %}
-> 5. **SortBED** {% icon tool %} with the following parameters
+> 3. {% tool [MACS2 callpeak](toolshed.g2.bx.psu.edu/repos/iuc/macs2/macs2_callpeak/2.2.7.1+galaxy0) %} for each combination input - ChIP data
+> 4. {% tool [Concatenate datasets](cat1) %} with the following parameters:
+>    - {% icon param-file %} *"Concatenate Dataset"*: one output of **MACS2 callpeak** {% icon tool %}
+>    - Click on *"Insert Dataset"*:
+>       - In *"Select"*: one other output of **MACS2 callpeak** {% icon tool %}
+>    - Redo for the 6 outputs of **MACS2 callpeak** {% icon tool %}
+> 5. {% tool [bedtools SortBED](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_sortbed/2.30.0+galaxy2) %} with the following parameters
 >    - {% icon param-file %} *"Sort the following bed,bedgraph,gff,vcf file"*: output of **Concatenate** {% icon tool %}
-> 6. **MergeBED** {% icon tool %} with the following parameters
+> 6. {% tool [bedtools MergeBED](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_mergebed/2.30.0) %} with the following parameters
 >    - {% icon param-file %} *"Sort the following bed,bedgraph,gff,vcf file"*: output of **SortBED** {% icon tool %}
-> 7. **computeMatrix** {% icon tool %} with the same parameters but:
+> 7. {% tool [computeMatrix](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_compute_matrix/deeptools_compute_matrix/3.5.1.0.0)%} with the same parameters but:
 >    - *"Select regions"*:
 >       - {% icon param-file %} *"Regions to plot"*: output of **MergeBED** {% icon tool %}
 >    - *"Sample order matters"*: `No`
@@ -684,13 +687,13 @@ So far, we have only analyzed 2 samples, but we can do the same for all the 6 sa
 >       - *"The reference point for the plotting"*: `center of region`
 >       - *"Distance upstream of the start site of the regions defined in the region file"*: `3000`
 >       - *"Distance downstream of the end site of the given regions"*: `3000`
-> 8. **plotHeatmap** {% icon tool %} with the following parameters
+> 8. {% tool [plotHeatmap](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_plot_heatmap/deeptools_plot_heatmap/3.5.1.0.1) %} with the following parameters
 >    - {% icon param-file %} *"Matrix file from the computeMatrix tool"*: `Matrix` (output of **computeMatrix** {% icon tool %})
 >    - *"Show advanced options"*: `yes`
 >       - *"Reference point label"*: select the right label
 >       - *"Did you compute the matrix with more than one groups of regions?"*: `No, I used only one group`
 >           - *"Clustering algorithm"*: `Kmeans clustering`
->           - *"Number of clusters to compute"*: `2`
+>           - *"Number of clusters to compute"*: `6`
 {: .hands_on}
 
 > <question-title></question-title>
@@ -709,9 +712,9 @@ So far, we have only analyzed 2 samples, but we can do the same for all the 6 sa
 > >
 > >     Target | Rep 1 | Rep 2
 > >     --- | --- | ---
-> >     CTCF | 2,688 | 2,062
-> >     H3K4me3 | 656 | 717
-> >     H3K27me3 | 221 | 76
+> >     CTCF | 2,672 | 2,061
+> >     H3K4me3 | 657 | 718
+> >     H3K27me3 | 220 | 73
 > >
 > >    The tendencies are similar for both replicates: more peaks for CTCF, less for H3K4me3 and only few for H3K27me3.
 > >

diff --git a/...ormation_of_super-structures_on_xi/workflows/formation_of_super_structures_on_xi-test.yml b/...ormation_of_super-structures_on_xi/workflows/formation_of_super_structures_on_xi-test.yml
@@ -17,7 +17,7 @@
     wt_H3K4me3_rep1_bowtie2:
       asserts:
         has_text:
-          text: '98.64% overall alignment rate'
+          text: '98.57% overall alignment rate'
     wt_H3K4me3_rep1_idxstats:
       asserts:
         has_text:
@@ -29,7 +29,7 @@
     wt_H3K4me3_input_rep1_bamcompare:
       asserts:
         has_text:
-          text: '-0.874559'
+          text: '-0.710403'
     wt_H3K4me3_input_rep1_macs2:
       asserts:
         has_text: