Rosetta Protein-folding workflow

This is a Pegasus workflow for running Rosetta's De novo structure prediction on the OSG. The workflow predicts the 3-dimensional structure of a protein starting with an amino acid sequence, using the Abinitio Relax algorithm. This workflow uses ideas from this tutorial.

Please run the workflow from your OSG Connect account. Anyone with a U.S. research affiliation can get access.

Configure Input files

You will need to have a license to download Rosetta. See the Rosetta documentation for details on how to obtain the license. Once you have the license, you can download the Rosetta software suite from https://www.rosettacommons.org/software/license-and-download.

Untar the downloaded file by running this command in your terminal:

tar -xvzf rosetta[releasenumber].tar.gz

Binaries

The ab initio executable can be found in rosetta*/main/source/bin. Navigate to this directory and copy the AbinitioRelax file to the bin directory of the rosetta_workflow. Make sure the file name in the last line of proteinfold.sh matches the one you copied.

Database

The Pegasus workflow takes as input the database as a tarball file. Create the tar file of the database folder found in rosetta*/main and place it in the database directory of the workflow.

cd [path to rosetta*]/main/ && tar -czf [path to rosetta workflow]/database/database.tar.gz database

Data inputs

A job in the rosetta workflow requires the following input files for an amino acid sequence:

Fasta file - Example: 1elwA.fasta
Fragments files - Example: aa1elwA03_05.200_v1_3 and aa1elwA09_05.200_v1_3
PDB file. Example - 1elw.pdb
Psipred secondary structure prediction psipred_ss2 file - Example: 1elwA.psipred_ss2

Note: Rename the input files to have the same base name.

Example: data-1.fasta, data-1.pdb, data-1.psipred_ss2, data-1-09_05.200_v1_3, data-1-03_05.200_v1_3 and the folder containing these input files as data-1.

Run the command on the folder data-<i> containing the above input files for a sequence

tar -cf data-<i>.tar.gz data-<i>

A proteinfold job is created for each file in inputs/. The workflow structure is a set of independent tasks executing bin/proteinfold.sh that takes the data tar file and database tar as input and produces a silent file as output.

Note that in the Transformation catalog section of the workflow, the clustering feature is enabled. This tells Pegasus to cluster multiple jobs together.

    proteinfold = Transformation(
            name="proteinfold",
            site="local",
            pfn=TOP_DIR / "bin/proteinfold.sh",
            is_stageable="True",
            arch=Arch.X86_64).add_pegasus_profile(clusters_size=10)

To disable clustering, set clusters_size to 1. Experiment with different values for clusters_size and observe how it affects the time required for the jobs to finish.

Run the workflow

Submit the workflow by executing proteinfolding.py.

    $ ./proteinfolding.py

You can use pegasus-status to check the status of the workflow and pegasus-statistics to display statistics once the workflow completes.

    $ pegasus-status [wfdir]
    $ pegasus-statistics [wfdir]

Outputs will be automatically staged to /home/$USER/workflows/output

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bin		bin
inputs		inputs
.DS_Store		.DS_Store
.pegasushub.yml		.pegasushub.yml
LICENSE		LICENSE
README.md		README.md
proteinfolding.py		proteinfolding.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Rosetta Protein-folding workflow

Configure Input files

Binaries

Database

Data inputs

Run the workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

pegasus-isi/rosetta-pegasus

Folders and files

Latest commit

History

Repository files navigation

Rosetta Protein-folding workflow

Configure Input files

Binaries

Database

Data inputs

Run the workflow

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages