Skip to content

Commit

Permalink
Update tutorial.md
Browse files Browse the repository at this point in the history
  • Loading branch information
subinamehta authored Dec 15, 2023
1 parent 9b99a8d commit 4a69564
Showing 1 changed file with 12 additions and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -122,9 +122,9 @@ First, we want to generate a large comprehensive protein sequence database using
For this tutorial, a literature survey was conducted to obtain 118 taxonomic species of organisms that are commonly associated with the female reproductive tract ([Afiuni-Zadeh, S., Boylan, K.L.M., Jagtap, P.D. et al., 2018](https://doi.org/10.1038/s41598-018-29092-4)). This species list was used to generate a protein sequence FASTA database was generated using the UniProt XML Downloader tool within the Galaxy framework. In this tutorial, the Species FASTA database (~3.38 million sequences) has already been provided as input. However, if you have your own list of species of interest as a tabular file (Your_Species_tabular.tabular), steps to generate a FASTA file from a tabular file are included:


## Sub-step with **UniProt**
## Download Protein Sequences using UniProt XML downloader with **UniProt**

> <hands-on-title> Download Protein Sequences using UniProt XML downloader</hands-on-title>
> <hands-on-title> UniProt XML downloader</hands-on-title>
> 1. {% tool [UniProt](toolshed.g2.bx.psu.edu/repos/galaxyp/uniprotxml_downloader/uniprotxml_downloader/2.3.0) %} with the following parameters:
> - *"Select"*: `Your_Species_tabular.tabular`
> - {% icon param-file %} *"Dataset (tab separated) with Taxon ID/Name column"*: `output` (Input dataset)
Expand Down Expand Up @@ -190,19 +190,19 @@ For this tutorial, a literature survey was conducted to obtain 118 taxonomic spe



## Sub-step with **Protein Database Downloader**
## Download contaminants with **Protein Database Downloader**

> <hands-on-title> Download contaminants </hands-on-title>
> <hands-on-title> Protein Database Downloader </hands-on-title>
>
> 1. {% tool [Protein Database Downloader](toolshed.g2.bx.psu.edu/repos/galaxyp/dbbuilder/dbbuilder/0.3.4) %} with the following parameters:
> - *"Download from?"*: `cRAP (contaminants)`
> 2. Rename as "Protein Database Contaminants (cRAP)"
>
{: .hands_on}

## Sub-step with **Protein Database Downloader**
## Download Human SwissProt(reviewed) database with **Protein Database Downloader**

> <hands-on-title> Download Human SwissProt (reviewed) database </hands-on-title>
> <hands-on-title> Protein Database Downloader</hands-on-title>
> 1. {% tool [Protein Database Downloader](toolshed.g2.bx.psu.edu/repos/galaxyp/dbbuilder/dbbuilder/0.3.4) %} with the following parameters:
> - *"Download from?"*: `UniProtKB(reviewed only)`
> - In *"Taxonomy"*: `Homo sapiens (Human)`
Expand All @@ -227,10 +227,10 @@ For this tutorial, a literature survey was conducted to obtain 118 taxonomic spe
{: .question}


## Sub-step with **FASTA Merge Files and Filter Unique Sequences**
## Merging databases to obtain large comprehensive database for MetaNovo.with **FASTA Merge Files and Filter Unique Sequences**
Once generated, the Species UniProt database (~3.38 million sequences) will be merged with the Human SwissProt database (reviewed only; ~20.4K sequences) and contaminant (cRAP) sequences database (116 sequences) and filtered to generate the large comprehensive database (~2.59 million sequences). The large comprehensive database will be used to generate a compact database using MetaNovo, which is much more manageable.

> <hands-on-title> Merging databases to obtain large comprehensive database for MetaNovo. </hands-on-title>
> <hands-on-title> FASTA Merge Files and Filter Unique Sequences </hands-on-title>
>
> 1. {% tool [FASTA Merge Files and Filter Unique Sequences](toolshed.g2.bx.psu.edu/repos/galaxyp/fasta_merge_files_and_filter_unique_sequences/fasta_merge_files_and_filter_unique_sequences/1.2.0) %} with the following parameters:
> - *"Run in batch mode?"*: `Merge individual FASTAs (output collection if input is collection)`
Expand All @@ -245,10 +245,10 @@ Once generated, the Species UniProt database (~3.38 million sequences) will be m

# Reducing Database size

## Sub-step with **MetaNovo**
## Metanovo tool generates a compact database from your comprehensive database with **MetaNovo**
Next, the large comprehensive database of ~2.59 million sequences can be reduced using the MetaNovo tool to generate a more manageable database that contains identified proteins. The compact MetaNovo-generated database (~1.9K sequences) will be merged with Human SwissProt (reviewed only) and contaminants (cRAP) databases to generate the reduced database (~21.2k protein sequences) that will be used for peptide identification (see [Discovery Module tutorial](https://github.com/subinamehta/training-material/blob/main/topics/proteomics/tutorials/clinical-mp-discovery/tutorial.md)).

> <hands-on-title> Metanovo tool generates a compact database from your comprehensive database. </hands-on-title>
> <hands-on-title> MetaNovo </hands-on-title>
>
> 1. {% tool [MetaNovo](toolshed.g2.bx.psu.edu/repos/galaxyp/metanovo/metanovo/1.9.4+galaxy4) %} with the following parameters:
> - *"MGF Input Type"*: `Collection`
Expand Down Expand Up @@ -306,9 +306,9 @@ Next, the large comprehensive database of ~2.59 million sequences can be reduced
>
{: .question}

## Sub-step with **FASTA Merge Files and Filter Unique Sequences**
## Merging databases to obtain reduced MetaNovo database for peptide discovery with **FASTA Merge Files and Filter Unique Sequences**

> <hands-on-title> Merging databases to obtain reduced MetaNovo database for peptide discovery. </hands-on-title>
> <hands-on-title> FASTA Merge Files and Filter Unique Sequences</hands-on-title>
>
> 1. {% tool [FASTA Merge Files and Filter Unique Sequences](toolshed.g2.bx.psu.edu/repos/galaxyp/fasta_merge_files_and_filter_unique_sequences/fasta_merge_files_and_filter_unique_sequences/1.2.0) %} with the following parameters:
> - *"Run in batch mode?"*: `Merge individual FASTAs (output collection if input is collection)`
Expand Down

0 comments on commit 4a69564

Please sign in to comment.