-
Notifications
You must be signed in to change notification settings - Fork 9
Home
Welcome to the PureCLIP wiki!
We use the PUM2 eCLIP data from ENCODE (Van Nostrand et. al, 2016), preprocessed as described in the previous step. Alternatively for testing you can download the preprocessed ENCODE data (would need extract R2 !).
PureCLIP starts with mapped reads, to be precise it assumes only reads containing information about potential truncation events: R1 for iCLIP data and R2 for eCLIP data.
~/pureclip -i pum2.aligned.pooled.R2.bam -bai pum2.aligned.pooled.R2.bam.bai -g ref.fasta -o PureCLIP.crosslink_sites.bed -nt 10 -iv '1;2;3;'
With --iv the chromosomes (or transcripts) can be specified that are used to learn the parameters of PureCLIPs HMM. This reduces the memory consumption and runtime. Usually, learning on a small subset of the chromosomes, e.g. Chr1-3, does not impair the results noticeable. However, in the case of very sparse data this can be adjusted.
~/pureclip -i pum2.aligned.pooled.R2.bam -bai pum2.aligned.pooled.R2.bam.bai -g hg19_ref.fasta -o PureCLIP.crosslink_sites.cov_inputSignal.bed -nt 10 -iv '1;2;3;' -g1g2k -ibam input_pum2.aligned.pooled.R2.bam -ibai input_pum2.aligned.pooled.R2.bam.bai
In order to incorporate CL-motifs into the model of PureCLIP, first we need to compute position-wise CL-motif scores, indicating the positions CL-affinity. You can skip the first two points and use the provided precompiled list of common CL-motifs. However, it might be more accurate to use the CL-motifs specific to the used eCLIP experiment.
-
Detect crosslink sites within an input control experiment:
-
Learn CL-motifs on these sites using DREME [cite]
-
Use FIMO to compute motif occurrences associated with a score within your reference.
~/pureclip -i pum2.aligned.pooled.R2.bam -bai pum2.aligned.pooled.R2.bam.bai -g hg19_ref.fasta -o PureCLIP.crosslink_sites.cov_CLmotifs.bed -nt 10 -iv '1;2;3;' -nim 4 -fis fimo_occurences.w10.m4.bed
The main output of PureCLIP is a BED6 file, containing individual crosslink sites together with a score:
chromosome, start, start+1, state=3, score, strand
Optionally, if an output file for binding regions is specified with --or, individual crosslink sites with a distance <= d (specified with --dm) are merged and given out in a separate BED7 file:
chromosome, start, end, ., score, strand, score1;score2;score3;
where the 5th column is the sum of the crosslink site scores, while the 7th column contains the individiual crosslink scores.