R-scripts, Bash-scripts and input tables.
Two R scripts for data analysis
- raspir_performance_evaluation_I.R (data analysis, mock community simulation)
- raspir_performance_evaluation_II.R (data analysis, real-world dataset)
exampleRun_mockCommunity_seed222.csv (input file, heatmap visualisation)
- Species are given as row and run parameters as column names
- Column names starting with "raspir_" show results obtained when incorporating raspir into the alignment procedure.
- Column names starting with "normal_" show the alignment results without raspir.
- The numerical data at the end of column names (c030, c050 ...) refers to the number of short reads that was selected for the rare species of the mock community.
- Explanation of numerical outcome:
0: True negative species
1: True positive rare species
2: False positive species
3: True positive core species
raspir_run_statistics.csv (data analysis, clinimetric properties)
- Shows all the numerical data obtained for simulations run with 20 different seeds set for the random read generator
- Two different alignment tools were used (Bowtie 2 and BWA)
download_fastq.sh (bash script for downloading biological samples with sra-explorer)
rawCounts_merged_samples_SRR7049258 (count table, per sample and species with raw read counts)
RPMM_merged_samples_SRR7049258 (count table, per sample and species with normalised read counts, RPMM: genome length and sequencing depth)
Contains all data tables obtained with raspir
A) Compressed .FASTA files of the core and rare species of the mock community
Core species
Rare species