Home

Welcome to the PureCLIP wiki!

Let's use the PUM2 eCLIP data from ENCODE (Van Nostrand et. al, 2016), preprocessed as described in the previous step:

PureCLIP

~/pureclip -i pum2.aligned.pooled.R2.bam -bai pum2.aligned.pooled.R2.bam.bai -g ref.fasta -o PureCLIP.crosslink_sites.bed -nt 10 -iv '1;2;3;'

With --iv the chromosomes (or transcripts) can be specified that are used to learn the parameters of PureCLIPs HMM. This reduces the memory consumption and runtime. Usually, learning on a small subset of the chromosomes, e.g. Chr1-3, does not impair the results noticeable. However, in the case of very sparse data this can be adjusted.

PureCLIP incorporating input control experiments

~/pureclip -i {input.bam} -bai {input.bai} -g {input.ref} -o {output.states} -nt 10 -iv '1;2;3;' -g1g2k -ibam '/project/lincRNA_seq/eCLIP/PUM2_new/input_reads/STAR/Aligned.f.duplRm.so.R2.bam' -ibai '/project/lincRNA_seq/eCLIP/PUM2_new/input_reads/STAR/Aligned.f.duplRm.so.R2.bam.bai'

PureCLIP incorporating CL-motif scores

In order to incorporate CL-motifs into the model of PureCLIP, first we need to: You can skip the first two points and use the provided precompiled list of common CL-motifs. However, it might be more accurate to use the CL-motifs specific to the used eCLIP experiment.

Detect crosslink sites within an input control experiment:
Learn CL-motifs on these sites using DREME [cite]
Use FIMO to compute motif occurrences associated with a score within your reference.

~/pureclip -i {input.bam} -bai {input.bai} -g {input.ref} -o {output.states} -nt 10 -iv '1;2;3;' -nim 4 -fis '/project/lincRNA_seq/eCLIP/PUM2_new/FIMO_CL_MOTIFS/hmm_1_HMM_SET0.top5000_matches_thresh0.01/fimo.final.w10.m4.bed'

PureCLIPs output

The main output of PureCLIP is a BED6 file, containing individual crosslink sites together with a score:

chromosome, start, start+1, state=3, score, strand

Optionally, if an output file for binding regions is specified with --or, individual crosslink sites with a distance <= d (specified with --dm) are merged and given out in a separate BED7 file:

chromosome, start, end, ., score, strand, score1;score2;score3;

where the 5th column is the sum of the crosslink site scores, while the 7th column contains the individiual crosslink scores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

PureCLIP

PureCLIP incorporating input control experiments

PureCLIP incorporating CL-motif scores

PureCLIPs output

Clone this wiki locally