Link to Elixir
Taken from http://consurf.tau.ac.il/overview.html
The ConSurf server [1] is a bioinformatics tool for estimating the evolutionary conservation of amino/nucleic acid positions in a protein/DNA/RNA molecule based on the phylogenetic relations between homologous sequences. The degree to which an amino (or nucleic) acid position is evolutionarily conserved is strongly dependent on its structural and functional importance; rapidly evolving positions are variable while slowly evolving positions are conserved. Thus, conservation analysis of positions among members from the same family can often reveal the importance of each position for the protein (or nucleic acid)'s structure or function. In ConSurf, the evolutionary rate is estimated based on the evolutionary relatedness between the protein (DNA/RNA) and its homologues and considering the similarity between amino (nucleic) acids as reflected in the substitutions matrix [2,3]. One of the advantages of ConSurf in comparison to other methods is the accurate computation of the evolutionary rate by using either an empirical Bayesian method or a maximum likelihood (ML) method [3].
-
Glaser, F., Pupko, T., Paz, I., Bell, R.E., Bechor-Shental, D., Martz, E. and Ben-Tal, N. (2003) Bioinformatics, 19, 163-164.
-
Mayrose,I., Graur,D., Ben-Tal,N. and Pupko,T. (2004) Mol. Biol. Evol., 21, 1781-1791.
-
Pupko,T., Bell,R.E., Mayrose,I., Glaser,F. and Ben-Tal,N. (2002) Bioinformatics, 18 S71-S77.
Below You will find the guidlines on how to succesfully install ConSurf on your machine. Current manual is designed for Debian OS.
In order for ConSurf to work properly please, install the following libraries:
autoconf
g++
libexpat-dev
libtool
perl
or simply run
sudo apt-get install autoconf g++ libexpat-dev libtool perl -y
It is mandatory to have perl installed on your machine. After Perl is installed please run the following commands in order to download and install perl modules
cpan
in CPAN run the following commands
sudo cpan force install Bio::Perl Config::IniFiles List::Util
ConSurf uses several programs to calculate the outputs. The Copyrights to these programs belongs to their owner. PLEASE MAKE SURE YOU FOLLOW THE LICENSE BY EACH OF THESE PROGRAMS.
The software can be found in the folder 3rd party software
. Here are the instructions to install the software itself.
Rate4Site - is the main algorithm behind ConSurf.
Installation:
The sources are already in the folder 3rd party software
. ConSurf requires the fast and slow versions. In order to obtain both of them simply run rate4site_build.sh
script:
./rate4site_build.sh
ClustalW - please download from EBI; ftp://ftp.ebi.ac.uk/pub/software/clustalw2/
In 3rd party software
folder You can already find the executable.
blastpgp can be found in 3rd party software/blast-2.2.26/bin
cd-hit can be found in 3rd party software/cdhit
To compile simply run:
./cd-hit_build.sh
("Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences", Weizhong Li & Adam Godzik Bioinformatics, (2006) 22:1658-9)
can be found in 3rd aprty software
("MUSCLE: a multiple sequence alignment method with reduced time and space complexity". Edgar R.C. (2004), BMC Bioinformatics 5: 113.)
Please download SwissProt and TrEMBL databases and format the databases for Blast search (using blast formadb which is part of blast system. manual )
- update the location of SwissProt to point the location of
SWISSPROT_DB
- update the location of TrEMBL to point the location of
UNIPROT_DB
.
Databases can be downloaded in FASTA format from [here] (http://www.uniprot.org/downloads) or just run the following commands:
wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
gunzip uniprot_sprot.fasta.gz
wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz
gunzip uniprot_trembl.fasta.gz
In order to run ConSurf is necessary to update the configuration file - consurfrc.default
- In
[programs]
section please enter the full path to the executables of 3rd party software covered earlier (Example:MUSCLE=\home\path_to_ConSurf\3rd party software\muscle3.8.31
) - in
[databases]
section please enter the full path to the DBs. (Example:SWISSPROT_DB=\home\path_to_DB_Folder\uniprot_sprot.fasta
)
To install ConSurf run the following commands in ConSurf folder:
aclocal
automake --force-missing --add-missing
autoconf
./configure
sudo make
sudo make install
The ConSurf works in several modes
- Given Protein PDB File.
- Given Multiple Sequence Alignment (MSA) and Protein PDB File.
- Given Multiple Sequence Alignment (MSA), Phylogenetic Tree, and PDB File.
The script is using the user provided MSA (and Phylogenetic tree if available) to calculate the conservation score for each position in the MSA based on the Rate4Site algorithm [2].
When running in the first mode, the script builds the MSA for the given protein based on the ConSurf protocol.
Usage:
consurf -PDB <PDB FILE FULL PATH> -CHAIN <PDB CHAIN ID> -Out_Dir <Output Directory>
Requirements for successful run:
- User must have
<PDB_ID>_<CHAIN_ID>_protein_seq.fas
in the directory specified as-workdir
. The file should contain the protein sequence in FASTA format. More information and the solution is available in Issue #14 - User is strongly encouraged to specify the
<PDB_ID>_<CHAIN_ID>.protein_query.blast
file. It is created in the directory specified as-workdir
by the ConSurf, but sometimes ConSurf might fail to find it. More information and the solution is available in Issue #17
-PDB <PDB FILE FULL PATH> - PDB File
-CHAIN <PDB CHAIN ID> - Chain ID
-Out_Dir <Output Directory> - Output Path
-MSA <MSA File Name>
(MANDATORY IF -m NOT USED)
-SEQ_NAME <"Query sequence name in MSA file">
(MANDATORY IF -m NOT USED)
-Tree <Phylogenetic Tree (in Newick format)>
(optional, default building tree by Rate4Site).
-m Builed MSA mode
-MSAprogram ["CLUSTALW"] or ["MUSCLE"] # default: MUSCLE
-DB ["SWISS-PROT"] or ["UNIPROT"] # default: UniProt
-MaxHomol <Max Number of Homologs to use for ConSurf Calculation> #deafult: 50
-Iterat <Number of PsiBlast iterataion> # default: 1
-ESCORE <Minimal E-value cutoff for Blast search> # default: 0.001
-BlastFile <pre-calculated blast file provided by the user>
(see http://consurf.tau.ac.il/overview.html#methodology and http://consurf.tau.ac.il/overview.html#MODEL for details)
-Algorithm [LikelihoodML] or [Bayesian] # default: Bayesian
-Matrix [JTT] or [mtREV] or [cpREV] or [WAG] or [Dayhoff] # default JTT
- Basic Build MSA mode (using defaults parameters)
consurf -PDB example/1lk2.pdb -CHAIN A -Out_Dir output -m
Example:
cd ConSurf
consurf -PDB example/1lk2.pdb -CHAIN A -Out_Dir output -m --workdir workdir -BlastFile example/1lk2_A.protein_query.blast
Taken from http://consurf.tau.ac.il/overview.html
Given the amino or nucleic acid sequence (can be extracted from the 3D structure), ConSurf carries out a search for close homologous sequences using BLAST (or PSI-BLAST) [4,5]. The user may select one of several databases and specify criteria for defining homologues. The user may also select the desired sequences from the BLAST results. The sequences are clustered and highly similar sequences are removed using CD-HIT [6]. A multiple sequence alignment (MSA) of the homologous sequences is constructed using MUSCLE(default) or CLUSTALW. The MSA is then used to build a phylogenetic tree using the neighbor-joining algorithm as implemented in the Rate4Site program [7]. Position-specific conservation scores are computed using the empirical Bayesian or ML algorithms [2,3]. The continuous conservation scores are divided into a discrete scale of nine grades for visualization, from the most variable positions (grade 1) colored turquoise, through intermediately conserved positions (grade 5) colored white, to the most conserved positions (grade 9) colored maroon. The conservation scores are projected onto the protein/nucleotide sequence and on the MSA.
-
Glaser, F., Pupko, T., Paz, I., Bell, R.E., Bechor-Shental, D., Martz, E. and Ben-Tal, N. (2003) Bioinformatics, 19, 163-164.
-
Mayrose,I., Graur,D., Ben-Tal,N. and Pupko,T. (2004) Mol. Biol. Evol., 21, 1781-1791.
-
Pupko,T., Bell,R.E., Mayrose,I., Glaser,F. and Ben-Tal,N. (2002) Bioinformatics, 18 S71-S77.
-
Altschul,S.F., Wootton,J.C., Gertz,E.M., Agarwala,R., Morgulis,A., Schaffer,A.A. and Yu,Y.K. (2005) FEBS J., 272, 5101-5109.
-
Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402.
-
Li,W. and Godzik,A. (2006) Bioinformatics, 22, 1658-1659.
-
Pupko, T., Bell, R.E., Mayrose, I., Glaser, F. and Ben-Tal, N. (2002) Bioinformatics, 18, S71-77.
In the folder vagrant_ConSurf
You can find the Vagrantfile, file necessary to run a Vagrant Box with ConSurf on it.
Intallation of Vagrant will not be explained in here.
In order to start the Box, execute the following:
cd /<path_to_dir>/vagrant_ConSurf
vagrant up --no-destroy-on-error
vagrant ssh
After the box has been started and You are connected to the VM through SSH, run export LC_ALL=C
.
All dependency libraires from section Installing Libraries should have been installed during execution of vagrant up
, rest can be set up according to Installation guidelines presented earlier.