Modules for analysis of crosslinking MS/FAIMS data.
This repository supports the analysis in the following manuscript (TBD). The analysis in the manuscript can partly be reproduced by first installing the requirements from the pipefile and running:
snakemake -s .\run_xifaims.smk -j 8 --printshellcmds
Then open the xifaims_xgb_notebook to generate the ML results.
List of available modules:
- const - stores amino acids constants for feature computation
- features - compute and manage feature computation
- ml - perform, document and store machine learning results
- parameters - store XGB parameter grids
- plots - visual presentation of results
- processing - various pre and post processing tools
- seq_lib - processing functions for sequences borrowed from xiRT
- 80% train, 20% test
- with 80% -> 3-fold cross-validation
- minimize neg_mean_squared_error in sklearns gridsearch
The main script that is executed via snakemake is xifaims_xgb.py. This file does the following
goal: build a predictor (xgboost) for CV based on sequence features.
steps:
- parse prepared dataframe with csms
- only use unique peptides (alpha/beta peptide and charge)
- only use TT for machine learning
- splits the data into 80/20 (training / validation)
- compute sequence-based features
- perform hyper parameter optimization for xgboost regressor on the 80% split
- if enabled perform feature selection and extract most predictive features
- compute metrics from training / validation split
- store meta data / data in a pickle file and excel file (for all possible objects)
The main script can be parameterized to only use specific sets of features, hyper parameters. The parameters folder has a couple of examples (e.g. faims_all.yaml). Further documentation on the command line arguments can be retrieved by executing --help on the terminal.
Open the xifaims_xgb_notebook to interactively go through the results.
Clone this repo and then install via pip (pip install -e .). Make sure to install the dependencies from the Pipfile. We recommend to use pipenv for this.
- Sven Giese
- Ludwig Sinn