-
Notifications
You must be signed in to change notification settings - Fork 34
SmartSeq2 scRNASeq QC Metrics
Sequencing QC metrics and its visualization can not only provide overall view of quality for experiment but also play important role in quality troubleshooting, library construction improvement.
Tables of metrics can provide an overview of alignment statistics,rna sequencing quality and more.
Alignment metrics can be used to provide overall idea of the quality of alignment for your libraries. One of important metrics is PCT_PF_ALIGNED
which indicates the percentage of reads mapped to reference genome. Another important metrics is PF_MISMATCH_RATE
, which can provide overall alignment quality. Example shown below.
RNA metrics provide important summary based on gene annotation. PCT_USABLEBASES
indicates the percentage of bases mapped to transcriptome(mRNA+UTR regions). This metrics provide overall view of quality of RNA sequencing. High values in PCT_INTRONIC_BASES
, PCT_INERGENIC_BASES
and PCT_RIBOSOMAL_BASES
indicate low quality or degraded RNA. High in MEDIAN_3PRIME_BIAS
also indicates high chance of degraded RNA. Example shown below.
These metrics provide based information on insert sizes for paired-end library. This metrics can be used to ensure that pair-end libraries are constructed as expected. Example shown below.
These metrics provide level of duplication(post alignment). This is coordinates based method, not raw fastq data based method. Example shown below.
In this task, we applied a scRNA-Seq pipeline on a published dataset GSE47872. We selected single cell samples include primary Glioblastoma and Gliomasphere Cell Line cells. The sample counts are listed below:
25bp | 100bp | |
---|---|---|
Glioblastoma | 581 | 96 |
Gliomasphere Cell Line | 195 | 0 |
We collected all metrics together and generated one table. We visualized several important metrics shown as below.
First,We examined metrics between two different celltype with the same read length 25bp.
TOTAL_READS
metrics' density plot shown in figure. Cancer primary cells yields slightly less total number of reads
PCT_PF_READS_ALIGNED
density plot. Overall, both celltypes yield ~75% alignment rate. There is a unusual peak at 25% alignment rate.
PF_MISMATCH_RATE
density plot. Primary cells PF_MISMATCH_RATE
have abnormal two peaks.
PCT_USABLE_BASES
density plot.
PCT_RIBOSOMAL_BASES
density plot. Both celltype yield good percentage of ribosomal bases.
MEDIAN_CV_COVERAGE
density plot. The median coefficient of variation (CV) or stdev/mean for coverage values of the 1000 most highly expressed transcripts. Low values is ideal
MEDIAN_3PRIME_BIAS
and MEDIAN_5PRIME_BIAS
density plots. Both celltypes show low 5' and 3' end bias but there is a long tail extend to high bias region which indicate there are degradation in some of cells.
MEDIAN_INSERT_SIZE
and MEDIAN_ABSOLUTE_DEVIATION
density plots. both celltypes fall into expected region, 200~400bp
PERCENT_DUPLICATION
density plot. Both celltypes show ~10% of duplication rate and a subset of cells show high duplication rate ~75%.
Then we compared paired metrics between two different read length libraries but are from the same human subject sample.
For alignment, we examined PCT_PF_READS_ALIGNED
, PF_MISMATCH_RATE
and MEDIAN_CV_COVERAGE
between paired libraries.
For RNA metrics, we examined PCT_USABLE_BASES
and PCT_INTRONIC_BASES
, PCT_RIBOSOMAL_BASES
For Insertion Size metrics, we examined MEDIAN_INSERT_SIZE
For Duplication metrics, we examined ERCENT_DUPLICATION