Skip to content

SEpapoulis/EscalationAndDe-escalationOfRM

Repository files navigation

Resource availability and viral DNA methylation drive the diversity and abundance of Restriction Modification Systems

Bioinformatic and Mathematical Modeling Results

Dependencies

Python 3 (>=3.7)
BLAST >=2.9.0+
HMMER >= 3.1b2
cdhit
hhsuite

Bioinformatic analysis is limited to OS/X and Linux due to alignment software limitations, while numerical simulations are OS independent as they are completed through SciPy

Installation

Navigate to your directory of choice and clone our github repo
git clone https://github.com/SEpapoulis/EscalationAndDe-escalationOfRM.git
From here, launch any of our jupyter notebooks to view our code, or look at demo/Demo.ipynb for a easily executable sample of our analysis!

Easy Installation of Dependencies

We highly recommend installing the Anaconda data science package as all software/modules required is availble through conda install. After anaconda is installed, enter the following in the conda prompt:
conda install -c bioconda hmmer
conda install -c bioconda blast
conda install -c bioconda cd-hit
conda install -c bioconda hhsuite

Expert Installation of Dependencies

Independently install all of the dependencies on your machine.
cdhit
hhsuite
HMMER
BLAST+

Data Demo

We have provided a demo of RM pipeline as well as some sample code to run numerical simulations using our memory model. Open the jupyer notebook and start executing cells! The code used in this demo is a representative sample of the whole project- bioinformatic code was applied to all 139,000 genomes and numerical simulations were similarly executed as described. While the sample bioinformatic analysis with Microcystis takes ~15min, numerical simulations may take between one and two hours depending on your machine. If you wish to see all notebooks used in this research, please see the NotebooksAndData directory.

Viewing project code

This code documents the results used to deepen our understanding of the selective pressures that govern the loss and gain of Restriction Modification Systems among prokaryotes. This project has been documented in exacutable Jupyter Notebooks and all imports are listed at the begining of notebook, with the exception of R code where libraries are imported per script. Jupyter Notebooks can be rendered in github, however, if notebooks fail to render, they can be downloaded and viewed locally. To view Jupyter Notebooks locally, simply install the Anaconda data science package.

Project Files

Code used to generate main body figures can be viewed in the Manuscript_Figures.ipynb while supplementary figures can be viewed in the Manuscript_Figures_SI.ipynb notebook, except for figures S7-S11, which can be found in NotebooksAndData/RM_Database/definitions/Defining_RM_types_5-3-2018.ipynb. Bellow, we describe the contents of our project folders.

  1. RM_Database - All database files needed to recapitulate RM annotation

    1. BLASTexceptions.fasta - Sequences without HMMs usef to find RM genes via BLAST
    2. Falsepos_HMMs.txt - HMMs that covaried with false positives (mostly helicases)
    3. RM_HMMs.txt - HMMs found in non-puatative Restriction Modification Sysetms
    4. non-putative_rebase.fasta - A reformatted file of NEB's non-putative protein sequences
    5. definitions (directory) - Our initial inquery used to define our database. SI figures are found here in Defining_RM_types_5-3-2018.ipynb
  2. RMsearch

    1. src (directory) - holds source code for RM searches
    2. RMsearch_12-11-2018.ipynb - Notebook for RM annotation
    3. data (directory) - outputs of RMsearch notebook
  3. GenMemODE

    1. src (directory) - holds source code needed for all numerical simulations
    2. Computational_Figures.ipynb - initial figures used for understanding our simulations
    3. GenMemODE.ipynb - Numerical Simulations

Additional Database files for hhsuite and HMMs used in this study

uniprot
protein databank
pfam 31.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published