Skip to content

metagenlab/blast2profile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

description

Build a whole genome phylogeny and diplay profiles of presence/absence of a set of sequences (based on tBLASTn results and defined identity cutoff). See examples below. Used to investigate the conservation of virulence factors in S. aureus complete genomes.

If multiple BLAST hits are colocalised (overlap between the coordinates of two hits), only the hit with the highest identity is retained. The phylogeny is build with parsnp. The figure made with the ete2 package.

installation

conda install pyqt=4
conda install -c etetoolkit ete2
conda install -c bioconda mlst
conda install -c bioconda blast

simple plot

  • -r reference genome

  • -f directory containing target fasta files. Fasta files should contain a single sequence. The fasta header should match the label file (parameter -a) or one LOCUS of one gbk file (parameter -g).

  • -b fasta file to blast. Fasta entry headers are used as comlum labels (see example below).

  • -m mlst sheme (see tseeman mlst)

  • -id do not show identity values (only color scale)

  • -c identity cutoff (aa)

  • -a label mapping file (for the phylogeny)

  • fasta file should contain a single sequence (multiple contigs should be concatenated into a single sequence)

  • the label mapping file should have the following structure: fasta_header label

CP007447	genome 1
CP013959	genome 2
CP016856	genome 3
CP016858	genome 4
CP016861	genome 5
CP016863	genome 6
CP017094	genome 7
CP024649	genome 8
LT699704	genome 9
blast2profile.py -r CP016861.fna -f fasta -b VF_saureus_edit.fa -m saureus -id -c 95 -a labels.tab

  • the numbers in parenthesis are the ST type identified with mlst.

plot

use genbank to get labels for the phylogeny

  • -g gbk file of the fasta files (used for the labels of the phylogeny). Fasta headers should match gbk LOCUS.
blast2profile.py -r CP016861.fna -f fasta -b VF_saureus_edit.fa -g gbk/*gbff -m saureus -id -c 95

plot

show identity values

blast2profile.py -r CP016861.fna -f fasta -b VF_saureus_edit.fa -g gbk/*gbff -m saureus -id -c 95

plot

highlight some cells with different colors

  • specific cells can be highlighted if a filter is provided. The structure of the filter should be as follow: protein_name genome_accession. It sould match the header of the fasta files.
  • cells are only colored if a significat tBLASTn hit is found in the corresponding sequence
VFG001273_hlgA	CP016858
VFG001274_hlgC	CP016858
VFG001275_hlgB	CP016858
VFG001276_lukF-PV	CP016858
VFG001277_lukS-PV	CP016858
VFG001289_clfA	CP013959
VFG001290_clfB	CP013959
VFG001291_cna	CP013959
VFG001292_hld	CP013959
VFG001293_hly/hla	CP013959
VFG001313_spa	CP013959
VFG001326_selq	CP013959
VFG001327_selk	CP013959
blast2profile.py -r CP016861.fna -f fasta -b VF_saureus_edit.fa -m saureus -c 95 -a labels.tab -n -l -s -bf highlight.tab -id

plot

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages