Skip to content

A package for predicting plasmid permissiveness from 16s rRNA data

License

Notifications You must be signed in to change notification settings

DaneshMoradigaravand/PlasmidPerm

Repository files navigation

Machine Learning aided Prediction of Plasmid Permsiveness

Table of contents

  1. Citation
  2. Introduction
  3. Installation
  4. Manual
  5. Output
  6. Supplemental files
  7. Contact

Citation

Citation

If you use the code or findings from this study, please cite the following article:

Danesh Moradigaravand, Liguan Li, Arnaud Dechesne, Joseph Nesme, Roberto de la Cruz, Huda Ahmad, Manuel Banzhaf, Søren J Sørensen, Barth F Smets, Jan-Ulrich Kreft
Plasmid Permissiveness of Wastewater Microbiomes can be Predicted from 16S rRNA Sequences by Machine Learning,
Bioinformatics, 2023;, btad400
https://doi.org/10.1093/bioinformatics/btad400

For any questions or inquiries, please contact the authors.


Introduction

The package predicts a relative value for permissiveness from any arbitrary 16s rRNA input data, using a random forest model. Plasmid permissiveness is the ability of recipient bacteria to receive external DNA through the mechanism of conjugation. Prediction is made in the relative model, i.e. the permissivenss value is compared with other strains in the training dataset and reported as the % of strains having a smaller permissiveness in the training dataset. Furthermore, the package reports the closest systematic type, based on the selected rank, to the input sequence. The prediction is made for the broad-range plasmids of pB10, pKJK5 and RP4.


Installation

There are three ways to run the tool:

  • The package may be downloaded and run as a binary file, plasmidperm.bin, as ./plasmidperm.bin

  • The tool is available on DockerHub and may be fetched and run using the following commmands:

docker pull daneshmoradigaravand/plasmidperm:tagname
docker run -v $PWD:/data --rm -it plasmidperm-docker ./app.py -i input_fasta_file -o /data/output_report

input_fasta_file and output_report should be names as the input fasta and output report files, respectively.

  • streamlite application. The graphical interface can be run usin streamlit. You need to navigate into the streamlit directory and launch the application, usin the following command:
streamlit run app.py

The commmand provides a link to the following front web application:


Manual

The tools is initiated using the binary command. The help instruction is called using -h option. Note in the multifasta file base U need to be replaced by T.

usage: plasmidperm.bin [-h] -i INPUT -o OUTPUT [-p {pB10,pKJK5,RP4}] [-r {Kingdom,Phylum,Class,Order,Family,Genus}] [-t TREE]

optional arguments:
  -h, --help            show this help message and exit
  -p {pB10,pKJK5,RP4}, --plasmid {pB10,pKJK5,RP4}
                        Plasmid specific prediction (default: pKJK5)
  -r {Kingdom,Phylum,Class,Order,Family,Genus}, --rank {Kingdom,Phylum,Class,Order,Family,Genus}
                        The closest rank to the input sequence (default: Order)
  -t TREE, --tree TREE  Produce phylogeny tree newick format (default True) (default: True)

Required arguments:
  -i INPUT, --input INPUT
                        input multifasta 16s rRNA (default: None)
  -o OUTPUT, --output OUTPUT
                        Output file name (default: stdout)


Output

The tool produces the following files

  1. output_file.csv file with the followinig structure: Input plasmid Closest Family Greater Than % Baseline Population
Tag Sequence Plasmid Closest Systematic Rank Greater Than % Baseline Population
seq1 AGCTGTGGGTTTA pB10 Pseudomonadaceae 95
seq2 AACCCGCGAGGAA pB10 Aeromonadaceae 65

Tag: The tags corresponding to the tas in the multifasta input file.

Sequence: The sequences from the multifasta file.

Plasmid: The plasmid for which perissiveness of reccipients are prredicted.

Closest Systematic Rank: The closest rank based on the Eucledian distance between the kmer profile for the sequence and sequuences in the training dataset.

Greater Than % Baseline Population: The percentage of isolates in the training dataset which had a permissiveness smmaller than the predicted permissiveness.

  1. binary_presence_absence_kmers.fasta The file contain binary semim-sequence file for the presence and absence of kmers denoted by A and C bases, respectively.

  2. binary_presence_absence_kmers.tre The phylogenetic neighbour-joining tree in the newick format made from the sequence distance matrix of the kmer sequences. The tree can be visulaized in Figtree.


Supplemental files

These files are located in the SupplementalFiles folder and include:

  1. Training_Code.ipynb The python code used for training the model.

  2. input_file.csv The input files used for training.


Contact

For queries, please contact Danesh Moradigaravand, Laboratory of Infectious Disease Epidemiology, KAUST Smart Health Initiative, KAUST.


About

A package for predicting plasmid permissiveness from 16s rRNA data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published