Calculate divergence for sites of each annotation category

This program calculates divergence between two species for sites of each annotated category.

Environment

Follow instructions in main documentation to set up the conda environmnet 'WGS_analysis'.

How to run this program

Configure file paths in the shell script and run it in bash.

In most cases, you do not have to modify the core python script, which count the number of differing reference alleles within each alignment block, and the core shell script, which parallelly count the number of different reference alleles within each alignment block, and calculate the proportion of differing reference alleles across all blocks for each site category.

Input & output

Run the following command to see input and output files:

python count_divergent_sites.py -h

Detailed steps

Get the coordinates of an alignment block in the target species (i.e. Drosophila. suzukii) from the 'map' file;
Read the multiple fasta alignment (.mfa) file of this alignment, and map bases of the outgroup species (i.e. Drosophila. biarmipes) onto the genome of the target species. This will generate a bed file of coordinates of matched sites on the target species, and a sequence of allele substitution/consistency (labeled as s/c);
Use the bed file to generate a tab-delimited table of site annotations;
Simultaneusly read the site annotation table and the sequence of allele substitution/consistency, while count the allele difference for each annotation category;
Calculate divergence as the proportion of differentiated sites for each annotation category;
(Optional) Plot divergence for each annotation category.

Step 1-4 is done by count_divergent_sites.py for each alignment block, and step 5 is implemented by parallel_by_block.sh. Step 6 is implemented by divergence_barplot.R.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Calculate divergence for sites of each annotation category

Environment

How to run this program

Input & output

Detailed steps

Files

README.md

Latest commit

History

README.md

File metadata and controls

Calculate divergence for sites of each annotation category

Environment

How to run this program

Input & output

Detailed steps