-
Notifications
You must be signed in to change notification settings - Fork 0
alfredsimkin/AMTA
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This set of scripts was developed to accompany my paper, "Inferring the evolutionary history of primate microRNA binding sites: overcoming motif counting biases." I wrote these scripts to analyze how frequently miRNA binding sites are gained and lost within 3' UTRs over evolutionary time within primates. I'm hoping others will find these scripts useful to analyze turnover rates of other motifs in other regions of the genome and in other species sets. These scripts can be reused for any purpose provided my paper is cited. The first component of my scripts is written to curate orthologous UTRs in multiple species using some well-annotated reference species, and then analyze the resulting aligned genes using a maximum likelihood approach implemented by DNAML. The second component of my scripts (in sfs_sims under supplemental scripts) is more loosely arranged to analyze simulated datasets and demonstrate flaws in motif-based approaches to phylogenetic reconstruction of ancestral states. The following instructions are for the first component. Other scripts have their own README files. The original input files are: 1. A list of all Refseq gene annotations from UCSC in BED format 2. A large sequence alignment file aligning every part of the gorilla genome to 10 other primates, downloaded from UCSC in MAF format 3. individual primate genomes for five species The core of my analysis is done using two publicly available programs, called lastz (used to compare genomes against each other) and dnaml (used to reconstruct hypothetical ancestral DNA sequences from living primate descendants, available from Joe Felsenstein's phylip package). You will need to have these programs installed. There are 3 main scripts needed to get my main results, each of which is a collection of smaller scripts. They're named: prepare_lastz.py orthologs_alignments_ancestors.py calculate_turnover_rates.py Each of these scripts is a driver for several smaller scripts, described in the comments. Briefly, prepare_lastz.py tidies up a BED file of transcripts in one genome and prepares to search other genomes for these transcripts. The program also tidies up a MAF file for use later on. After running lastz on the results of prepare_lastz.py, orthologs_alignments_ancestors.py uses stringent filters to remove non 1 to 1 orthologs from the dataset, creates alignments of these 1 to 1 orthologs using the MAF file, and uses DNAML to reconstruct ancestral transcript states. calculate_turnover_rates.py operates on the output of orthologs_alignments_ancestors.py and calculates how fast every 8mer turns over relative to other highly similar 8mers.
About
ancestral motif turnover analysis
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published