Pepsickle paper repository

This repo contains the companion code for the manuscript: "Pepsickle rapidly and accurately predicts proteasomal cleavage sites for improved neoantigen identification" which can be found here.

Trained model availability and use:

All fully trained models have been deployed as a separate software package with instructions for installation and use. This can be found at: https://github.com/pdxgx/pepsickle

How to run analysis in Linux/Unix:

Download the code in this repository:

git clone https://github.com/pdxgx/pepsickle-paper.git
cd ./pepsickle-paper

Setup and install necessary libraries (also requires python 3 and mysql to be installed). For full list of python requirements see reqirements.txt:
```
pip install -r requirements.txt
```
Dowload requisite datasets too large for Repo upload:

Enter the ./data/raw/database_pulls directory and download the IEDB static data dump needed for analysis:

NOTE: Paper analysis was performed on data pulled June, 29th, 2020. For identical reproduction subset all extracted data, including IEDB data, to entries on or before the specified date.
```
cd ./data/raw/database_pulls
wget http://www.iedb.org/downloader.php?file_name=doc/iedb_public.sql.gz
```
The IEDB database ERD can also be found here.

The AntiJen database query feature is not working currently and has yet to be repaired, however the processed data from previously working queries can be found here.

For comparing training samples to the human proteome background, please download the human proteome from UniProt.
```
cd ../
wget https://www.uniprot.org/uniprot/?query=proteome:UP000005640%20reviewed:yes#
```

Return to the main directory and edit the MASTER.sh script to include your mysql user name and password using nano or a text editor of your choice.

nano MASTER.sh

 #### SETUP
 ## set working directory to base dir of project
 cd /path/to/pepsickle-paper
 ## set temp environmental vars for mysql use
 export MYSQL_USER=[USER]
 export MYSQL_PWD=[PASSWORD]

Create output directories that are expected by the pipeline. The following directories are needed for proper pipeline output:
```
while read d; do
     echo "mkdir $d"
done < directory_list.txt
```
run the following command to iterate through data retrieval and processing steps:

This script runs through the primary analysis and model training pipeline. Alternative models and options mentioned in the manuscript are also available but commented out for streamlining. Some steps are slow and annotated as such in comment lines.

bash run_analysis.sh

NOTE: The validation analysis steps at the end of run_analysis.sh require the installation of pepsickle. To install pepsickle using the weights generated by this pipeline, follow step 7. For testing the deployed pepsickle tool on the included validation data, simply follow the installation steps on the pepsickle repo page.
Install pepsickle. For assessing performance on validation data, the deployed tool framework is used. To install pepsickle using newly trained model weights instead of those built in by default, change out of the pepsickle-paper directory and then follow these steps:
```
git clone https://github.com/pdxgx/pepsickle
cd ./pepsickle/pepsickle
cp ./pepsickle-paper/data/model_weights/model.joblib .
cd ../
pip install .
```
This will replace the default model.joblib file containing pre-trained weights with those trained through the rest of the pepsickle-paper pipeline.

Information on data sources:

This project aggregates data from a variety of databases, including:

Data from peer reviewed literature was also aggregated. More details on paper specific data can be found in the main text (Table X).

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
data		data
scripts		scripts
.gitignore		.gitignore
MASTER.sh		MASTER.sh
README.md		README.md
directory_list.txt		directory_list.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pepsickle paper repository

Trained model availability and use:

How to run analysis in Linux/Unix:

Information on data sources:

About

Releases

Packages

Languages

pdxgx/pepsickle-paper

Folders and files

Latest commit

History

Repository files navigation

Pepsickle paper repository

Trained model availability and use:

How to run analysis in Linux/Unix:

Information on data sources:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages