Skip to content
skrakau edited this page May 25, 2017 · 41 revisions

Welcome to the PureCLIP wiki!

We use the PUM2 eCLIP data from ENCODE (Van Nostrand et. al, 2016), preprocessed as described in the previous step. Alternatively for testing you can download the preprocessed ENCODE data (would need extract R2 !).

PureCLIP

PureCLIP starts with mapped reads, to be precise it assumes only reads containing information about potential truncation events: R1 for iCLIP data and R2 for eCLIP data.

~/pureclip -i pum2.aligned.pooled.R2.bam -bai pum2.aligned.pooled.R2.bam.bai -g ref.fasta -o PureCLIP.crosslink_sites.bed -nt 10 -iv '1;2;3;'

With --iv the chromosomes (or transcripts) can be specified that are used to learn the parameters of PureCLIPs HMM. This reduces the memory consumption and runtime. Usually, learning on a small subset of the chromosomes, e.g. Chr1-3, does not impair the results noticeable. However, in the case of very sparse data this can be adjusted.

PureCLIP incorporating input control experiments

~/pureclip -i pum2.aligned.pooled.R2.bam -bai pum2.aligned.pooled.R2.bam.bai -g hg19_ref.fasta -o PureCLIP.crosslink_sites.cov_inputSignal.bed -nt 10 -iv '1;2;3;' -g1g2k -ibam input_pum2.aligned.pooled.R2.bam -ibai input_pum2.aligned.pooled.R2.bam.bai

PureCLIP incorporating CL-motif scores

In order to incorporate CL-motifs into the model of PureCLIP, first we need to compute position-wise CL-motif scores, indicating the positions CL-affinity. You can skip the first two points and use the provided precompiled list of common CL-motifs. However, it might be more accurate to use the CL-motifs specific to the used eCLIP experiment.

  1. Detect crosslink sites within an input control experiment:

  2. Learn CL-motifs on these sites using DREME [cite]

  3. Use FIMO to compute motif occurrences associated with a score within your reference.

    ~/pureclip -i pum2.aligned.pooled.R2.bam -bai pum2.aligned.pooled.R2.bam.bai -g hg19_ref.fasta -o PureCLIP.crosslink_sites.cov_CLmotifs.bed -nt 10 -iv '1;2;3;' -nim 4 -fis fimo_occurences.w10.m4.bed

PureCLIPs output

The main output of PureCLIP is a BED6 file, containing individual crosslink sites together with a score:

chromosome, start, start+1, state=3, score, strand

Optionally, if an output file for binding regions is specified with --or, individual crosslink sites with a distance <= d (specified with --dm) are merged and given out in a separate BED7 file:

chromosome, start, end, ., score, strand, score1;score2;score3;

where the 5th column is the sum of the crosslink site scores, while the 7th column contains the individiual crosslink scores.

Clone this wiki locally