diff --git a/_posts/0002-06-01-Alignment_QC.md b/_posts/0002-06-01-Alignment_QC.md index 40498e2..564b9c4 100644 --- a/_posts/0002-06-01-Alignment_QC.md +++ b/_posts/0002-06-01-Alignment_QC.md @@ -37,7 +37,9 @@ Try filtering the BAM file to require or exclude certain flags. This can be done * [http://broadinstitute.github.io/picard/explain-flags.html](http://broadinstitute.github.io/picard/explain-flags.html) -Try requiring that alignments are 'paired' and 'mapped in a proper pair' (=3). Also filter out alignments that are 'unmapped', the 'mate is unmapped', and 'not primary alignment' (=268) +Try requiring that alignments are 'paired' and 'mapped in a proper pair' (=3). + +Also filter out alignments that are 'unmapped', the 'mate is unmapped', and 'not primary alignment' (=268) ```bash samtools view -f 3 -F 268 UHR.bam | head | column -t | less -S @@ -89,7 +91,7 @@ mv *fastqc.zip fastqc/ ### Using Picard You can use Picard to generate RNA-seq specific quality metrics and figures -In this section we need to create some additional formats of our reference files. +In this section we need to create some additional formats of our reference transcriptome files. Picard uses a "sequence dictionary" file for many commands (simply a list of reference sequences and their sizes) @@ -105,14 +107,15 @@ cd $RNA_HOME/refs # Create a .dict file for our reference java -jar $PICARD CreateSequenceDictionary -R chr22_with_ERCC92.fa -O chr22_with_ERCC92.dict -# Create a bed file of the location of ribosomal sequences in our reference (first extract from the gtf then convert to bed) +# Create a bed file of the location of ribosomal sequences in our reference (first extract them from the GTF then convert to BED format) # Note that here we pull all the "rrna" transcripts from the GTF. This is a good strategy for the whole transcriptome ... # ... but on chr22 there is very little "rrna" content, leading to 0 coverage for all samples, so we are also adding a single protein coding ribosomal gene "RRP7A" (normally we would not do this) -# Note that for the convert2bed command we will treat out GTF file as a GFF file to retain exon level features when converting it to BED + +# Note the convert2bed command will convert our GTF to BED format # "<" is used to feed the GTF file into the tool. ">2/dev/null" is used to throw away a harmless warning. "1>" is use to save our result to a file grep --color=none -i -P "rrna|rrp7a" chr22_with_ERCC92.gtf > ref_ribosome.gtf -convert2bed --input=gff --output=bed < ref_ribosome.gtf 2>/dev/null 1>ref_ribosome.bed +convert2bed --input=gtf --output=bed < ref_ribosome.gtf 2>/dev/null 1>ref_ribosome.bed # Create interval list file for the location of just the ribosomal sequences in our reference java -jar $PICARD BedToIntervalList -I ref_ribosome.bed -O ref_ribosome.interval_list -SD chr22_with_ERCC92.dict