This folder contains the intermediate results files. They are organized as follows:
This was computed using 1.1.1.sh, which produces CSV output with the following columns:
- file name
- total reads
- average percentage of bases ≥ Q30
This was computed with cutadapt.smk, which trims the reads using cutadapt at Q30. We assess the effect of this by comparing FastQC reports before and after trimming.
This was computed using 1.2.1.py. This script assumes that a homopolymer is a stretch of ≥4 identical bases, and produces CSV output with the following columns:
- file name
- total reads
- average decay
This was computed using 1.2.2.sh, which produces CSV output with the following columns:
- Category, i.e. Empty, Undetermined, or Demultiplexed
- Number of reads
(Every line in the CSV is an input file.)
This was performed by mapping all paired file sets to the PhiX reference genome in the data folder using bowtie2. This yielded a SAM file for each file pair. All SAM files are then merged, sorted and indexed using samtools. Subsequently, PhiX Error Rate (2 points), PhiX Alignment Rate (1 point) and PhiX Coverage Uniformity (1 point) were computed with assess_phix_quality.py.
This was computed using 1.2.4.py, which produces CSV output with the following columns:
- file name
- total reads
- unique reads
- duplicate reads
- duplicate rate
- duplicate percentage