Using deep learning to generate in silico spectral libraries for data-independent acquisition (DIA) analysis.
- Python >= 3.5. Anaconda is recommended.
- Keras with TensorFlow backend.
- R. As an alternative, the latest version of Microsoft R Open should be fine.
- RStudio is recommended but optional.
DeepDIA has been tested on a workstation with Intel Core i9-7960X CPU, 128 GB RAM, and Microsoft Windows 10 Version 1809 (OS Build 17763.503) 64-bit operating system. For model training, a GPU card with Compute Unified Device Architecture (CUDA) is recommended, e.g. NVIDIA GeForce GTX 1050 Ti.
Download Anaconda Installer form https://www.anaconda.com/distribution/.
DeepDIA has been tested using Anaconda 4.2.0 (Python 3.5.2).
Install TensorFlow using pip
:
pip install --upgrade tensorflow
For GPU-supported version,
pip install --upgrade tensorflow-gpu
TensorFlow documentation is available at https://www.tensorflow.org/.
Install Keras:
pip install keras
Keras documentation is available at https://keras.io/.
DeepDIA has been tested using Keras 2.2.4 and TensorFlow 1.11.
R is available at https://www.r-project.org/. As an alternative, Microsoft R Open is available at https://mran.microsoft.com/open/.
RStudio is available at https://www.rstudio.com/.
DeepDIA has been tested using Microsoft R Open 3.5.1 and RStudio 1.1.447.
Start R, ensure packages readr
and rjson
have been installed.
install.packages("readr")
install.packages("rjson")
Expected install time depends on the performance of the computer and the network condition. Typical installation takes up to 30 min.
A peptide list is stored in a comma-separated values (CSV) file including a column named sequence
.
"protein","sequence"
"O43504","HDGITVAVHK"
"P56470","VGSSGDIALHINPR"
"Q9UHL4","LDHFNFER"
"P68371","IREEYPDR"
"P01024","AKDQLTCNK"
Peptides can be collected from public resources.
From the Pan Human Library (Rosenberger, G. et al. Sci. Data 2014, 1, 140031), peptide lists have been collected and provided as an example in data/peptide
folder:
- Pan_human.peptide.csv
- Pan_human_charge2.peptide.csv
- Pan_human_charge3.peptide.csv
DeepDIA only supports peptide sequences with standard amino acids (ACDEFGHIKLMNPQRSTVWY) and length <= 50.
Peptide lists can also been generated by in silico digestion from protein sequences, e.g. SwissProt Homo sapiens database downloaded from UniProt (https://www.uniprot.org/) with or without MS detectability filtering. See DeepDIA Demo: Spectral Library Generation From Proteome Databases and DeepDIA Demo: Spectral Library Generation with Detectability Prediction in the docs
folder for details.
Prepare a model for MS/MS prediction.
You can use pre-trained models or train your own models. A model trained with HeLa data on Q Exactive HF is provided as an example in data/models
folder:
- data/models/charge2
- data/models/charge3
Copy the peptide list Pan_human_charge2.peptide.csv
to the model directory data/models/charge2
.
Run deepms2/py/predict.py
in the directory.
cd {PATH_TO_MODEL}
python {PATH_TO_CODE}/deepms2/py/predict.py
Predict MS/MS for charge 3+ following the same steps.
Prepare a model for iRT prediction.
You can use pre-trained models or train your own models. A model trained with HeLa data on Q Exactive HF (Bruderer, R. et al. Mol. Cell. Proteomics 2017, 16, 2296-2309) is provided as an example in data/models
folder:
- data/models/irt
Copy the peptide list Pan_human.peptide.csv
to the model directory data/models/irt
.
Run deeprt/py/predict.py
in the directory.
cd {PATH_TO_MODEL}
python {PATH_TO_CODE}/deeprt/py/predict.py
Move the predicted MS/MS and iRT files to the same directory with the peptide list.
- Pan_human.peptide.csv
- Pan_human_charge2.prediction.ions.json
- Pan_human_charge3.prediction.ions.json
- Pan_human.prediction.irt.csv
Start R and run init.R
to load the functions.
source("{PATH_TO_CODE}/init.R")
Set the peptide list directory as working directory and run generate_spectral_library_for_Spectronaut.R
.
setwd("{PATH_TO_DATA}")
source("{PATH_TO_CODE}/generate_spectral_library_for_Spectronaut.R")
Expected run time depends on the number of peptides and the performance of the computer.
The output library file can be imported into Spectronaut. Spectronaut Manual is available at https://www.biognosys.com/shop/spectronaut.
Prepare an ions file for MS/MS prediction.
An ions file can be converted from SpectroMine fragment reports (CSV).
The CSV report should be exported with the schema provided in the misc/SpectroMine_Report_Schema
folder:
- FragmentReport.rs
Start R and run deepms2/R/extract_ions_from_Spectronaut_report.R
source("{PATH_TO_CODE}/deepms2/R/extract_ions_from_Spectronaut_report.R")
As an alternative, MaxQuant results (msms.txt
) are also supported
using deepms2/R/extract_ions_from_MaxQuant_report.R
.
Run deepms2/py/train.py
.
python {PATH_TO_CODE}/deepms2/py/train.py
Prepare an iRT file for iRT prediction.
An iRT file can be converted from SpectroMine fragment reports
using deeprt/R/extract_irt_from_Spectronaut_report.R
Run deeprt/py/train.py
.
python {PATH_TO_CODE}/deeprt/py/train.py
Follow the instruction described in DeepDIA Demo: Training a New Model for Detectability Prediction in the docs
folder.
Yang, Y., Liu, X., Shen, C., Lin, Y., Yang, P., Qiao, L. In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat Commun 11, 146 (2020). https://doi.org/10.1038/s41467-019-13866-z.
DeepDIA (the main Python code) is distributed under a BSD license. See the LICENSE file for details.
Exceptionally, R scripts for data preprocessing require rjson
and readr
package under a GPL-2 licence.
Please report any problems directly to the github issue tracker. Also, you can send feedback to [email protected].