GitHub - alfredsimkin/AMTA: ancestral motif turnover analysis

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
script_folder		script_folder
supplemental_scripts		supplemental_scripts
README		README
calculate_turnover_rates.py		calculate_turnover_rates.py
orthologs_alignments_ancestors.py		orthologs_alignments_ancestors.py
prepare_lastz.py		prepare_lastz.py

Repository files navigation

This set of scripts was developed to accompany my paper, "Inferring the
evolutionary history of primate microRNA binding sites: overcoming motif
counting biases." I wrote these scripts to analyze how frequently miRNA binding
sites are gained and lost within 3' UTRs over evolutionary time within
primates. I'm hoping others will find these scripts useful to analyze turnover
rates of other motifs in other regions of the genome and in other species sets.
These scripts can be reused for any purpose provided my paper is cited.

The first component of my scripts is written to curate orthologous UTRs in
multiple species using some well-annotated reference species, and then analyze
the resulting aligned genes using a maximum likelihood approach implemented by
DNAML. The second component of my scripts (in sfs_sims under supplemental
scripts) is more loosely arranged to analyze simulated datasets and demonstrate
flaws in motif-based approaches to phylogenetic reconstruction of ancestral
states.

The following instructions are for the first component. Other scripts have
their own README files.

The original input files are:
1. A list of all Refseq gene annotations from UCSC in BED format
2. A large sequence alignment file aligning every part of the gorilla genome 
to 10 other primates, downloaded from UCSC in MAF format
3. individual primate genomes for five species

The core of my analysis is done using two publicly available programs, 
called lastz (used to compare genomes against each other) and dnaml (used to 
reconstruct hypothetical ancestral DNA sequences from living primate 
descendants, available from Joe Felsenstein's phylip package). You will need
to have these programs installed.

There are 3 main scripts needed to get my main results, each of which is a 
collection of smaller scripts. They're named:

prepare_lastz.py
orthologs_alignments_ancestors.py
calculate_turnover_rates.py

Each of these scripts is a driver for several smaller scripts, described in
the comments.

Briefly, prepare_lastz.py tidies up a BED file of transcripts in one genome and
prepares to search other genomes for these transcripts. The program also tidies up
a MAF file for use later on.

After running lastz on the results of prepare_lastz.py,
orthologs_alignments_ancestors.py uses stringent filters to remove non 1 to 1 orthologs
from the dataset, creates alignments of these 1 to 1 orthologs using the MAF file, and
uses DNAML to reconstruct ancestral transcript states.

calculate_turnover_rates.py operates on the output of orthologs_alignments_ancestors.py
and calculates how fast every 8mer turns over relative to other highly similar 8mers.