Skip to content

Commit

Permalink
Update ChIP-Seq tutorial with latest tool versions and links
Browse files Browse the repository at this point in the history
  • Loading branch information
pavanvidem committed Jul 17, 2023
1 parent 3b9ae5e commit 9875d80
Show file tree
Hide file tree
Showing 4 changed files with 2,000 additions and 3,679 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ contributors:
- vivekbhr
- fidelram
- LeilyR
- pavanvidem
---

# Introduction
Expand Down Expand Up @@ -126,8 +127,8 @@ During sequencing, errors are introduced, such as incorrect nucleotides being ca
Sequence quality control is therefore an essential first step in your analysis. We use here similar tools as described in ["Quality control" tutorial]({{site.baseurl}}/topics/sequence-analysis): [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and [Trim Galore](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/).

> <hands-on-title>Quality control</hands-on-title>
>
> 1. Run **FastQC** {% icon tool %} with the following parameters
>
> 1. Run {% tool [FastQC](toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.73+galaxy0) %} with the following parameters:
> - {% icon param-files %} *"Short read data from your current history"*: `wt_H3K4me3_read1` and `wt_H3K4me3_read2` (Input datasets selected with **Multiple datasets**)
>
> {% snippet faqs/galaxy/tools_select_multiple_datasets.md %}
Expand Down Expand Up @@ -193,7 +194,7 @@ It is often necessary to trim sequenced read, for example, to get rid of bases t

> <hands-on-title>Trimming low quality bases</hands-on-title>
>
> 1. Run **Trim Galore!** {% icon tool %} with the following parameters
> 1. Run {% tool [Trim Galore!](toolshed.g2.bx.psu.edu/repos/bgruening/trim_galore/trim_galore/0.6.7+galaxy0) %} with the following parameters:
> - *"Is this library paired- or single-end?"*: `Paired-end`
> - {% icon param-file %} *"Reads in FASTQ format"*: `wt_H3K4me3_read1` (Input dataset)
> - {% icon param-file %} *"Reads in FASTQ format"*: `wt_H3K4me3_read2` (Input dataset)
Expand Down Expand Up @@ -241,7 +242,7 @@ With ChiP sequencing, we obtain sequences corresponding to a portion of DNA link

> <hands-on-title>Mapping</hands-on-title>
>
> 1. **Bowtie2** {% icon tool %} with
> 1. {% tool [Bowtie2](toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.5.0+galaxy0) %} with the following parameters:
> - *"Is this single or paired library"*: `Paired-end`
> - {% icon param-file %} *"FASTA/Q file #1"*: `trimmed reads pair 1` (output of **Trim Galore!** {% icon tool %})
> - {% icon param-file %} *"FASTA/Q file #2"*: `trimmed reads pair 2` (output of **Trim Galore!** {% icon tool %})
Expand All @@ -256,9 +257,9 @@ With ChiP sequencing, we obtain sequences corresponding to a portion of DNA link
> > How many reads where mapped? Uniquely or several times?
> >
> > > <solution-title></solution-title>
> > > The overall alignment rate is 98.64%. This score is quite high. If you have less than 70-80%, you should investigate the cause: contamination, etc.
> > > The overall alignment rate is 98.57%. This score is quite high. If you have less than 70-80%, you should investigate the cause: contamination, etc.
> > >
> > > 43719 (90.27%) reads have been aligned concordantly exactly 1 time and 3340 (6.90%) aligned concordantly >1 times. The latter ones correspond to multiple mapped reads. Allowing for multiple mapped reads increases the number of usable reads and the sensitivity of peak detection;
> > > 41514 (85.72%) reads have been aligned concordantly exactly 1 time and 5190 (10.72%) aligned concordantly >1 times. The latter ones correspond to multiple mapped reads. Allowing for multiple mapped reads increases the number of usable reads and the sensitivity of peak detection;
> > > however, the number of false positives may also increase.
> > {: .solution }
> {: .question}
Expand Down Expand Up @@ -306,7 +307,7 @@ Since in this tutorial we are interested in assessing H3K4me3, H3K27me3 and CTCF
> ```
>
> 3. Rename the files
> 3. **multiBamSummary** {% icon tool %} with the following parameters
> 3. {% tool [multiBamSummary](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_multi_bam_summary/deeptools_multi_bam_summary/3.5.1.0.0) %} with the following parameters:
> - *"Sample order matters"*: `No`
> - {% icon param-files %} *"BAM/CRAM file"*: the 8 imported BAM files
> - *"Choose computation mode"*: `Bins`
Expand All @@ -322,7 +323,7 @@ Since in this tutorial we are interested in assessing H3K4me3, H3K27me3 and CTCF
>
> Using these parameters, the tool will take bins of 1000 bp separated by 500 bp on the chromosome X. For each bin the overlapping reads in each sample will be computed and stored into a matrix.
>
> 4. **plotCorrelation** {% icon tool %} with the following parameters
> 4. {% tool [plotCorrelation](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_plot_correlation/deeptools_plot_correlation/3.5.1.0.0) %} with the following parameters:
> - {% icon param-files %} *"Matrix file from the multiBamSummary tool"*: `correlation matrix`(output of **multiBamSummary** {% icon tool %})
> - *"Correlation method"*: `Pearson`
>
Expand Down Expand Up @@ -354,7 +355,7 @@ Similar to **multiBamSummary** {% icon tool %}, **plotFingerprint** {% icon tool

> <hands-on-title>IP strength estimation</hands-on-title>
>
> 1. **plotFingerprint** {% icon tool %} with the following parameters
> 1. {% tool [plotFingerprint](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_plot_fingerprint/deeptools_plot_fingerprint/3.5.1.0.0) %} with the following parameters:
> - *"Sample order matters"*: `No`
> - {% icon param-files %} *"BAM/CRAM file"*: `wt_input_rep1` and `wt_H3K4me3_rep1`
> - *"Region of the genome to limit the operation to"*: `chrX`
Expand Down Expand Up @@ -401,12 +402,12 @@ To learn how to do the normalization, we will take the `wt_H3K4me3_rep1` sample

> <hands-on-title>Estimation of the sequencing depth</hands-on-title>
>
> 1. **IdxStats** {% icon tool %} with the following parameters
> 1. {% tool [Samtools idxstats](toolshed.g2.bx.psu.edu/repos/devteam/samtools_idxstats/samtools_idxstats/2.0.4) %} with the following parameters:
> - {% icon param-files %} *"BAM file"*: `wt_H3K4me3_rep1.bam` and `wt_input_rep1.bam`
>
> > <question-title></question-title>
> >
> > 1. What is the output of **IdxStats** {% icon tool %}?
> > 1. What is the output of **Samtools idxstats** {% icon tool %}?
> > 2. How many reads has been mapped on chrX for the input and for the ChIP-seq samples?
> > 3. Why are the number of reads different? And what could be the impact of this difference?
> >
Expand All @@ -424,7 +425,7 @@ We are using **bamCoverage** {% icon tool %}. Given a BAM file, this tool genera

> <hands-on-title>Coverage file normalization</hands-on-title>
>
> 1. **bamCoverage** {% icon tool %} with the following parameters
> 1. {% tool [bamCoverage](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_coverage/deeptools_bam_coverage/3.5.1.0.0) %} with the following parameters:
> - {% icon param-files %} *"BAM file"*: `wt_H3K4me3_rep1.bam` and `wt_input_rep1.bam`
> - *"Bin size in bases"*: `25`
> - *"Scaling/Normalization method"*: `Normalize coverage to 1x`
Expand All @@ -443,7 +444,7 @@ We are using **bamCoverage** {% icon tool %}. Given a BAM file, this tool genera
> > {: .solution }
> {: .question}
>
> 2. **bamCoverage** {% icon tool %} with the same parameters but
> 2. {% tool [bamCoverage](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_coverage/deeptools_bam_coverage/3.5.1.0.0) %} with the same parameters but
> - *"Coverage file format"*: `bigWig`
>
> > <question-title></question-title>
Expand All @@ -455,7 +456,7 @@ We are using **bamCoverage** {% icon tool %}. Given a BAM file, this tool genera
> > {: .solution }
> {: .question}
>
> 3. **IGV** {% icon tool %} to inspect both signal coverages (input and ChIP samples) in IGV
> 3. Use **IGV** {% icon tool %} to inspect both signal coverages (input and ChIP samples).
>
{: .hands_on}

Expand All @@ -476,7 +477,7 @@ To extract only the information induced by the immunoprecipitation, we normalize

> <hands-on-title>Generation of input-normalized coverage files</hands-on-title>
>
> 1. **bamCompare** {% icon tool %} with the following parameters
> 1. {% tool [bamCompare](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_compare/deeptools_bam_compare/3.5.1.0.0) %} with the following parameters:
> - {% icon param-file %} *"First BAM file (e.g. treated sample)"*: `wt_H3K4me3_rep1.bam`
> - {% icon param-file %} *"Second BAM file (e.g. control sample)"*: `wt_input_rep1.bam`
> - *"Bin size in bases"*: `50`
Expand All @@ -495,10 +496,10 @@ To extract only the information induced by the immunoprecipitation, we normalize
> > {: .solution }
> {: .question}
>
> 2. **bamCompare** {% icon tool %} with the same parameters but:
> 2. {% tool [bamCompare](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_compare/deeptools_bam_compare/3.5.1.0.0) %} with the same parameters but:
> - *"Coverage file format"*: `bigWig`
>
> 3. **IGV** {% icon tool %} to inspect the log2 ratio
> 3. Use **IGV** {% icon tool %} to inspect the log2 ratio.
>
{: .hands_on}

Expand All @@ -519,7 +520,7 @@ We could see in the ChIP data some enriched regions (peaks). We now would like t

> <hands-on-title>Peak calling</hands-on-title>
>
> 1. **MACS2 callpeak** {% icon tool %} with the following parameters
> 1. {% tool [MACS2 callpeak](toolshed.g2.bx.psu.edu/repos/iuc/macs2/macs2_callpeak/2.2.7.1+galaxy0) %} with the following parameters:
> - *"Are you pooling Treatment Files?"*: `No`
> - {% icon param-file %} *"ChIP-Seq Treatment File"*: `wt_H3K4me3_rep1.bam`
> - *"Do you have a Control File?"*: `Yes`
Expand Down Expand Up @@ -561,7 +562,7 @@ We could see in the ChIP data some enriched regions (peaks). We now would like t
>
> > <solution-title></solution-title>
> > 1. We can see 11 peaks (track below the genes).
> > 2. Using **Filter** {% icon tool %} with `c2>151385260 and c3<152426526`, we found that the 11 peaks with fold changes between 3.81927 and 162.06572
> > 2. Using {% tool [Filter](Filter1) %} with `c2>151385260 and c3<152426526`, we found that the 11 peaks with fold changes between 3.81927 and 162.06572
> > 4. On the 656 peaks on the full chromosome (number of lines of the original BED file) there are 252 peaks with FC>50 (using **Filter** {% icon tool %} with `c7>50`)
> {: .solution }
{: .question}
Expand All @@ -576,15 +577,15 @@ Since we already generated the required files for the H3K4me3 sample, let's make

> <hands-on-title>Prepare the peaks and data for CTCF</hands-on-title>
>
> 1. **bamCompare** {% icon tool %} with the following parameters
> 1. {% tool [bamCompare](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_compare/deeptools_bam_compare/3.5.1.0.0) %} with the following parameters:
> - {% icon param-file %} *"First BAM file (e.g. treated sample)"*: `wt_CTCF_rep1.bam`
> - {% icon param-file %} *"Second BAM file (e.g. control sample)"*: `wt_input_rep1.bam`
> - *"Bin size in bases"*: `50`
> - *"How to compare the two files"*: `Compute log2 of the number of reads ratio`
> - *"Coverage file format"*: `bigwig`
> - *"Region of the genome to limit the operation to"*: `chrX`
> 2. Rename the output of **bamCompare** {% icon tool %} with the name of the sample
> 3. **MACS2 callpeak** {% icon tool %} with the following parameters
> 3. {% tool [MACS2 callpeak](toolshed.g2.bx.psu.edu/repos/iuc/macs2/macs2_callpeak/2.2.7.1+galaxy0) %} with the following parameters
> - *"Are you pooling Treatment Files?"*: `No`
> - {% icon param-file %} *"ChIP-Seq Treatment File"*: `wt_CTCF_rep1.bam`
> - *"Do you have a Control File?"*: `Yes`
Expand All @@ -599,13 +600,14 @@ We can now concatenate the MACS2 outputs with the location of the peaks (concate

> <hands-on-title>Prepare the peak coordinates</hands-on-title>
>
> 1. **Concatenate two datasets into one dataset** {% icon tool %} with the following parameters
> - {% icon param-file %} *"Concatenate"*: output of **MACS2 callpeak** {% icon tool %} for `wt_CTCF_rep1`
> - {% icon param-file %} *"with"*: output of **MACS2 callpeak** {% icon tool %} for `wt_H3K4me3_rep1`
> 2. **SortBED** {% icon tool %} with the following parameters
> - {% icon param-file %} *"Sort the following bed,bedgraph,gff,vcf file"*: output of **Concatenate** {% icon tool %}
> 3. **MergeBED** {% icon tool %} with the following parameters
> - {% icon param-file %} *"Sort the following bed,bedgraph,gff,vcf file"*: output of **SortBED** {% icon tool %}
> 1. {% tool [Concatenate datasets](cat1) %} with the following parameters:
> - {% icon param-file %} *"Concatenate Dataset"*: output of **MACS2 callpeak** {% icon tool %} for `wt_CTCF_rep1`
> - Click on *"Insert Dataset"*:
> - For *"Dataset"*: select the output of **MACS2 callpeak** {% icon tool %} for `wt_H3K4me3_rep1`
> 2. {% tool [bedtools SortBED](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_sortbed/2.30.0+galaxy2) %} with the following parameters:
> - {% icon param-file %} *"Sort the following BED/bedGraph/GFF/VCF/EncodePeak file"*: output of **Concatenate** {% icon tool %}
> 3. {% tool [bedtools MergeBED](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_mergebed/2.30.0) %} with the following parameters:
> - {% icon param-file %} *"Sort the following BAM/BED/bedGraph/GFF/VCF/EncodePeak file"*: output of **SortBED** {% icon tool %}
>
{: .hands_on}

Expand All @@ -617,7 +619,7 @@ Optionally, we can also use **plotProfile** {% icon tool %} to create a profile

> <hands-on-title>Plot the heatmap</hands-on-title>
>
> 1. **computeMatrix** {% icon tool %} with the following parameters:
> 1. {% tool [computeMatrix](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_compute_matrix/deeptools_compute_matrix/3.5.1.0.0)%} with the following parameters:
> - *"Select regions"*:
> - {% icon param-file %} *"Regions to plot"*: output of **MergeBED** {% icon tool %}
> - *"Sample order matters"*: `No`
Expand All @@ -626,7 +628,7 @@ Optionally, we can also use **plotProfile** {% icon tool %} to create a profile
> - *"The reference point for the plotting"*: `center of region`
> - *"Distance upstream of the start site of the regions defined in the region file"*: `3000`
> - *"Distance downstream of the end site of the given regions"*: `3000`
> 2. **plotHeatmap** {% icon tool %} with the following parameters
> 2. {% tool [plotHeatmap](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_plot_heatmap/deeptools_plot_heatmap/3.5.1.0.1) %} with the following parameters
> - {% icon param-file %} *"Matrix file from the computeMatrix tool"*: `Matrix` (output of **computeMatrix** {% icon tool %})
> - *"Show advanced options"*: `yes`
> - *"Reference point label"*: select the right label
Expand Down Expand Up @@ -658,24 +660,25 @@ So far, we have only analyzed 2 samples, but we can do the same for all the 6 sa

> <hands-on-title>(Optional) Plot the heatmap for all the samples</hands-on-title>
>
> 1. **bamCompare** {% icon tool %} for each combination input - ChIP data:
> 1. {% tool [bamCompare](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_bam_compare/deeptools_bam_compare/3.5.1.0.0) %} for each combination input - ChIP data:
> 1. `wt_CTCF_rep1` - `wt_input_rep1` (already done)
> 2. `wt_H3K4me3_rep1` - `wt_input_rep1` (already done)
> 3. `wt_H3K27me3_rep1` - `wt_input_rep1`
> 4. `wt_CTCF_rep2` - `wt_input_rep2`
> 5. `wt_H3K4me3_rep2` - `wt_input_rep2`
> 6. `wt_H3K27me3_rep2` - `wt_input_rep2`
> 2. Rename the outputs of **bamCompare** {% icon tool %} with the name of the ChIP data
> 3. **MACS2 callpeak** {% icon tool %} for each combination input - ChIP data
> 4. **Concatenate datasets tail-to-head** {% icon tool %} with the following parameters
> - {% icon param-file %} *"Concatenate Dataset"*: one output of **MACS2 callpeak** {% icon tool %}
> - Click *"Insert Dataset"* and {% icon param-file %} *"Select"* one other output of **MACS2 callpeak** {% icon tool %}
> - Redo for the 6 outputs of **MACS2 callpeak** {% icon tool %}
> 5. **SortBED** {% icon tool %} with the following parameters
> 3. {% tool [MACS2 callpeak](toolshed.g2.bx.psu.edu/repos/iuc/macs2/macs2_callpeak/2.2.7.1+galaxy0) %} for each combination input - ChIP data
> 4. {% tool [Concatenate datasets](cat1) %} with the following parameters:
> - {% icon param-file %} *"Concatenate Dataset"*: one output of **MACS2 callpeak** {% icon tool %}
> - Click on *"Insert Dataset"*:
> - In *"Select"*: one other output of **MACS2 callpeak** {% icon tool %}
> - Redo for the 6 outputs of **MACS2 callpeak** {% icon tool %}
> 5. {% tool [bedtools SortBED](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_sortbed/2.30.0+galaxy2) %} with the following parameters
> - {% icon param-file %} *"Sort the following bed,bedgraph,gff,vcf file"*: output of **Concatenate** {% icon tool %}
> 6. **MergeBED** {% icon tool %} with the following parameters
> 6. {% tool [bedtools MergeBED](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_mergebed/2.30.0) %} with the following parameters
> - {% icon param-file %} *"Sort the following bed,bedgraph,gff,vcf file"*: output of **SortBED** {% icon tool %}
> 7. **computeMatrix** {% icon tool %} with the same parameters but:
> 7. {% tool [computeMatrix](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_compute_matrix/deeptools_compute_matrix/3.5.1.0.0)%} with the same parameters but:
> - *"Select regions"*:
> - {% icon param-file %} *"Regions to plot"*: output of **MergeBED** {% icon tool %}
> - *"Sample order matters"*: `No`
Expand All @@ -684,13 +687,13 @@ So far, we have only analyzed 2 samples, but we can do the same for all the 6 sa
> - *"The reference point for the plotting"*: `center of region`
> - *"Distance upstream of the start site of the regions defined in the region file"*: `3000`
> - *"Distance downstream of the end site of the given regions"*: `3000`
> 8. **plotHeatmap** {% icon tool %} with the following parameters
> 8. {% tool [plotHeatmap](toolshed.g2.bx.psu.edu/repos/bgruening/deeptools_plot_heatmap/deeptools_plot_heatmap/3.5.1.0.1) %} with the following parameters
> - {% icon param-file %} *"Matrix file from the computeMatrix tool"*: `Matrix` (output of **computeMatrix** {% icon tool %})
> - *"Show advanced options"*: `yes`
> - *"Reference point label"*: select the right label
> - *"Did you compute the matrix with more than one groups of regions?"*: `No, I used only one group`
> - *"Clustering algorithm"*: `Kmeans clustering`
> - *"Number of clusters to compute"*: `2`
> - *"Number of clusters to compute"*: `6`
{: .hands_on}

> <question-title></question-title>
Expand All @@ -709,9 +712,9 @@ So far, we have only analyzed 2 samples, but we can do the same for all the 6 sa
> >
> > Target | Rep 1 | Rep 2
> > --- | --- | ---
> > CTCF | 2,688 | 2,062
> > H3K4me3 | 656 | 717
> > H3K27me3 | 221 | 76
> > CTCF | 2,672 | 2,061
> > H3K4me3 | 657 | 718
> > H3K27me3 | 220 | 73
> >
> > The tendencies are similar for both replicates: more peaks for CTCF, less for H3K4me3 and only few for H3K27me3.
> >
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
wt_H3K4me3_rep1_bowtie2:
asserts:
has_text:
text: '98.64% overall alignment rate'
text: '98.57% overall alignment rate'
wt_H3K4me3_rep1_idxstats:
asserts:
has_text:
Expand All @@ -29,7 +29,7 @@
wt_H3K4me3_input_rep1_bamcompare:
asserts:
has_text:
text: '-0.874559'
text: '-0.710403'
wt_H3K4me3_input_rep1_macs2:
asserts:
has_text:
Expand Down
Loading

0 comments on commit 9875d80

Please sign in to comment.