Skip to content
forked from lmsac/DeepDIA

Using deep learning to generate in silico spectral libraries for data-independent acquisition analysis. You can also use the online service powered by Omicsolution.

License

Notifications You must be signed in to change notification settings

omidroshani/DeepDIA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepDIA

Using deep learning to generate in silico spectral libraries for data-independent acquisition (DIA) analysis.

System Requirements

  • Python >= 3.5. Anaconda is recommended.
  • Keras with TensorFlow backend.
  • R. As an alternative, the latest version of Microsoft R Open should be fine.
  • RStudio is recommended but optional.

DeepDIA has been tested on a workstation with Intel Core i9-7960X CPU, 128 GB RAM, and Microsoft Windows 10 Version 1809 (OS Build 17763.503) 64-bit operating system. For model training, a GPU card with Compute Unified Device Architecture (CUDA) is recommended, e.g. NVIDIA GeForce GTX 1050 Ti.

Installation

1. Install Python (Anaconda)

Download Anaconda Installer form https://www.anaconda.com/distribution/.

DeepDIA has been tested using Anaconda 4.2.0 (Python 3.5.2).

2. Install TensorFlow and Keras

Install TensorFlow using pip:

pip install --upgrade tensorflow

For GPU-supported version,

pip install --upgrade tensorflow-gpu

TensorFlow documentation is available at https://www.tensorflow.org/.

Install Keras:

pip install keras

Keras documentation is available at https://keras.io/.

DeepDIA has been tested using Keras 2.2.4 and TensorFlow 1.11.

3. Install R and RStudio

R is available at https://www.r-project.org/. As an alternative, Microsoft R Open is available at https://mran.microsoft.com/open/.

RStudio is available at https://www.rstudio.com/.

DeepDIA has been tested using Microsoft R Open 3.5.1 and RStudio 1.1.447.

Start R, ensure packages readr and rjson have been installed.

install.packages("readr")
install.packages("rjson")

Expected install time depends on the performance of the computer and the network condition. Typical installation takes up to 30 min.

Guide to Generate Spectral Libraries for DIA Analysis

1. Prepare a Peptide List

A peptide list is stored in a comma-separated values (CSV) file including a column named sequence.

"protein","sequence"
"O43504","HDGITVAVHK"
"P56470","VGSSGDIALHINPR"
"Q9UHL4","LDHFNFER"
"P68371","IREEYPDR"
"P01024","AKDQLTCNK"

Peptides can be collected from public resources. From the Pan Human Library (Rosenberger, G. et al. Sci. Data 2014, 1, 140031), peptide lists have been collected and provided as an example in data/peptide folder:

  • Pan_human.peptide.csv
  • Pan_human_charge2.peptide.csv
  • Pan_human_charge3.peptide.csv

DeepDIA only supports peptide sequences with standard amino acids (ACDEFGHIKLMNPQRSTVWY) and length <= 50.

Peptide lists can also been generated by in silico digestion from protein sequences, e.g. SwissProt Homo sapiens database downloaded from UniProt (https://www.uniprot.org/) with or without MS detectability filtering. See DeepDIA Demo: Spectral Library Generation From Proteome Databases and DeepDIA Demo: Spectral Library Generation with Detectability Prediction in the docs folder for details.

2. Predict MS/MS Spectra

Prepare a model for MS/MS prediction. You can use pre-trained models or train your own models. A model trained with HeLa data on Q Exactive HF is provided as an example in data/models folder:

  • data/models/charge2
  • data/models/charge3

Copy the peptide list Pan_human_charge2.peptide.csv to the model directory data/models/charge2. Run deepms2/py/predict.py in the directory.

cd {PATH_TO_MODEL}
python {PATH_TO_CODE}/deepms2/py/predict.py

Predict MS/MS for charge 3+ following the same steps.

3. Predict iRT

Prepare a model for iRT prediction. You can use pre-trained models or train your own models. A model trained with HeLa data on Q Exactive HF (Bruderer, R. et al. Mol. Cell. Proteomics 2017, 16, 2296-2309) is provided as an example in data/models folder:

  • data/models/irt

Copy the peptide list Pan_human.peptide.csv to the model directory data/models/irt. Run deeprt/py/predict.py in the directory.

cd {PATH_TO_MODEL}
python {PATH_TO_CODE}/deeprt/py/predict.py

4. Generate Spectral Library

Move the predicted MS/MS and iRT files to the same directory with the peptide list.

  • Pan_human.peptide.csv
  • Pan_human_charge2.prediction.ions.json
  • Pan_human_charge3.prediction.ions.json
  • Pan_human.prediction.irt.csv

Start R and run init.R to load the functions.

source("{PATH_TO_CODE}/init.R")

Set the peptide list directory as working directory and run generate_spectral_library_for_Spectronaut.R.

setwd("{PATH_TO_DATA}")
source("{PATH_TO_CODE}/generate_spectral_library_for_Spectronaut.R")

Expected run time depends on the number of peptides and the performance of the computer.

5. DIA Data Analysis

The output library file can be imported into Spectronaut. Spectronaut Manual is available at https://www.biognosys.com/shop/spectronaut.

Guide to Train a Model

1. Train a MS/MS Model

Prepare an ions file for MS/MS prediction. An ions file can be converted from SpectroMine fragment reports (CSV). The CSV report should be exported with the schema provided in the misc/SpectroMine_Report_Schema folder:

  • FragmentReport.rs

Start R and run deepms2/R/extract_ions_from_Spectronaut_report.R

source("{PATH_TO_CODE}/deepms2/R/extract_ions_from_Spectronaut_report.R")

As an alternative, MaxQuant results (msms.txt) are also supported using deepms2/R/extract_ions_from_MaxQuant_report.R.

Run deepms2/py/train.py.

python {PATH_TO_CODE}/deepms2/py/train.py

2. Train an iRT Model

Prepare an iRT file for iRT prediction. An iRT file can be converted from SpectroMine fragment reports using deeprt/R/extract_irt_from_Spectronaut_report.R

Run deeprt/py/train.py.

python {PATH_TO_CODE}/deeprt/py/train.py

3. Train a MS Detectability Model

Follow the instruction described in DeepDIA Demo: Training a New Model for Detectability Prediction in the docs folder.

Publications

Yang, Y., Liu, X., Shen, C., Lin, Y., Yang, P., Qiao, L. In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat Commun 11, 146 (2020). https://doi.org/10.1038/s41467-019-13866-z.

License

DeepDIA (the main Python code) is distributed under a BSD license. See the LICENSE file for details. Exceptionally, R scripts for data preprocessing require rjson and readr package under a GPL-2 licence.

Contacts

Please report any problems directly to the github issue tracker. Also, you can send feedback to [email protected].

About

Using deep learning to generate in silico spectral libraries for data-independent acquisition analysis. You can also use the online service powered by Omicsolution.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 53.4%
  • R 40.3%
  • Jupyter Notebook 4.2%
  • PowerShell 2.1%