A repository providing helper scripts used in the analysis of genomic and phylogenetic data included in the publication titled "Highly-resolved genomes of two closely related lineages of the rodent louse Polyplax serrata with different host specificities".
Used in obtaining sets of orthologs in fasta format corresponding to different arthropods' species from a nexus format ortholog alignment file and corresponding partition file sourced from de Moya et al 2020. The script requires Biopython for its function.
Utilizing bamCaller.py to iterate over a list of bam files and defined contigs and generate an output of single VCF and a mask file in bed format per sample ID/contig ID combination.
Requires:
Utilizes generate_multihetsep.py for parsing and processing VCF and bed files generated by bamcall_multi.sh and organized in separate subfolders based on thier contigs ID. Output multihetsep files are used downstream in msmc2 based analysis.
Run msmc2 using two different settings to assess the population history interference rubustness and variability under set conditions. The script iterate over bootstrap replicates placed in separate folders and run 5 jobs in parallel with 20 threads set to each job. Results in the output are generated in two separate folders representing each setting used.
R script that utilizes vegan library to calculate Bray-Curtis dissimilarity and perform principal coordinates analysis (PCoA) on the dissimilarity matrix of Pfam and InterProScan hits between compared genomes then visualize it.