How to make QM-cluster models and input files with RINRUS

The RINRUS workflow can be run all in one go with the driver or done step by step with the individual scripts. Both ways of using RINRUS are described in this document.

Using the driver

The driver runs the entire RINRUS model building procedure from initial PDB structure to input file with one command (and a few interactive inputs along the way). The structure pre-processing described below in part 1 of the step-by-step usage instructions needs to be done before using the driver (except for protonation with reduce).

# Usage of the RINRUS driver
python $HOME/git/RINRUS/bin/RINRUS_driver.py -i driver_input

# Arguments:
-i FILE    rinrus driver input file (default: driver_input)

Input file

The driver_input file should contain the following details:

Path_to_scripts: [path to RINRUS bin directory]
PDB: [starting pdb file name]
Protonate_initial: [true/t/y or false/f/n]
Seed: [seed residues]
RIN_program: [probe/arpeggio/distance/manual]
Model(s): [model number or maximal or all]
Seed_charge: [overall charge of seed]
Multiplicity: [model multiplicity]
Computational_program: [gaussian/gau-xtb/orca/qchem/none]
# optional lines:
Must_include: [any non-seed fragments (SC, N-term. and/or C-term. of residue) that need to be in model, e.g. A:6:C,A:7:S+C+N,A:8:N]
RIN_info_file: [file to use with manual RIN input, default 'res_atoms.dat']
input_template_path: [path to input file template if not using standard templates]
Gaussian_basis_intmp: [true/t/y to use basis set info from Gaussian template rather than default basis sets]

Log file

The driver writes a log file called rinrus_log_[date].out with the details of the run/commands called.

Commands run by the driver:

# If reduce protonation selected
$HOME/git/RINRUS/bin/reduce -NOFLIP -Quiet PDB.pdb > PDB_h.pdb 

# If probe RIN selected:
$HOME/git/RINRUS/bin/probe -unformated -MC -self "all" -Quiet PDB_h.pdb > PDB.probe
$HOME/git/RINRUS/bin/probe2rins.py -f PDB.probe -s [seed residues]

# If arpeggio RIN selected
$HOME/git/RINRUS/bin/arpeggio/arpeggio.py PDB_h.pdb
$HOME/git/RINRUS/bin/arpeggio2rins.py -f PDB.contacts -s [seed residues]

# If distance RIN selected
$HOME/git/RINRUS/bin/pdb_dist_rank.py -pdb PDB_h.pdb -s [seed residues] -cut [cutoff] -type [avg/mass]

# Model trimming and capping
$HOME/git/RINRUS/bin/rinrus_trim2_pdb.py -s [seed residues] -pdb PDB_h.pdb -model N -c [res_atoms.dat/contact_counts.dat/res_atom-X.dat depending on RIN program] -mustadd [must_include fragments]
$HOME/git/RINRUS/bin/pymol_protonate.py -pdb res_N.pdb (if specified: -ignore_ids residues) (if specified: -ignore_atoms atoms) (if specified: -ignore_atnames atomtypes)

# Input file creation (if computational program not set to 'none')
$HOME/git/RINRUS/bin/write_input.py -format computational_program -c seed_charge -type hopt -noh res_N.pdb -adh res_N_h.pdb (if specified: -intmp input_template_path) (if specified: -basisinfo intmp)

Step by step usage of RINRUS

Examples throughout this section are based on the enzyme COMT (PDB:3BWM), with the active site being built around the Mg2+ ion, SAM cofactor and catechol substrate.

1. Preparing structure files

If your structure is not appropriately pre-processed then go through the following steps to protonate and clean up substrate molecules. This is the user's responsibility to verify, since RINRUS will generate models from provided structure and cannot yet run sanity checks.

After getting a raw PDB file (3bwm.pdb), check for nonstandard fragments and clean up atoms or residues with alternate positions/conformations. Listed below are some possible tasks that need to be done before starting the workflow:

a) If there are multiple conformations for a residue then keep only one conformation.

b) If starting from an MD simulation, check that metal coordination is correct. You may also want to delete neutralizing counterions if they have migrated into the active site.

c) If crystallographic symmetry is embedded into the pdb, one needs to be very careful and manually "unfold" the multimers from the PDB website. This might be more common with older X-ray crystal structures. RINRUS will eventually be improved to recognize this and automate symmetry unpacking.

d) We have seen situations where AMBER truncates atom names of two-letter elements when writing to PDB format (turning FE into F for example). If the PDB file comes from a source other than PDB repository or similar database, check all atom names, residue names, and residue IDs.
If the protein is not yet protonated, run reduce to generate a new protonated PDB file (3bwm_h.pdb) or protonate with other program of your choice. Skip this step if the pdb file is from an MD simulation since it has been protonated already.
```
# Example usage of reduce
$HOME/git/RINRUS/bin/reduce -NOFLIP 3bwm.pdb > 3bwm_h.pdb 
```
Check the new PDB file. Probe doesn't recognize covalent bonds between fragments, which becomes a problem with metalloenzymes and metal-ligand coordination. Our workaround is to replace a metal coordination center with an atom that probe recognizes as a H-bond donor or acceptor to capture interactions with the coordinating ligand atoms. For example in COMT, we replace Mg with O, and save as a new PDB file (3bwm_h_modify.pdb).
Check all ligands, make sure H atoms were added correctly (may need to delete or add more H based on certain conditions). The user needs to protonate ligand and substrate properly since reduce does not recognize most substrates and may add H improperly. Check the ligand 2D drawing for the PDB entry on the RCSB website for some guidance on substrate protonation if starting from scratch. Performing this step correctly is the user's responsibility! RINRUS will generate models from provided structure and does not yet have comprehensive sanity checks.

2. Selecting the active site/generating the active site RIN

Defining the "seed" is one of the most important steps of the QM-cluster model building process. What should you select as the seed? Typically, the seed will be the substrate(s) (or ligand in biochemical terms) participating in the chemical reaction. Any amino acid residues, co-factors, or fragments which participate in the active site catalytic breaking and forming of chemical bonds may also need to be included as part of the seed, but this will generate much larger models compare to only using the substrate.

The seed is specified as a comma-separated list of colon-separated Chain:ResidueID pairs. In the example of 3BWM, we select the seed as A:300 (Mg2+), A:301 (SAM) and A:302 (catechol).

If the PDB you're using does not have chain identifiers, you will need to specify ":XXX" where XXX is the residue ID number in this step and beyond. Our current defaults are wonky in these cases and need to be improved. If the protein is multimeric, use the chain of your choice for seed fragments. Note that some multimeric x-ray crystal structures may not necessarily have equivalent active sites!

The active site RIN can be automatically determined from the seed fragments based on probe contacts, arpeggio contacts or distance. For all selection metrics the key output is a file, usually called res_atoms.dat or similar, containing a list of the identified active site residues ranked by the chosen metric and the atoms identified by the selection procedure. This file will be used as the input for the trimming procedure in part 3.

2A. Using Probe contact count ranking

First run probe on the (modified) PDB file to generate a *.probe file of all contacts in the enzyme

# Example usage of probe:
$HOME/git/RINRUS/bin/probe -unformated -MC -self "all" 3bwm_h_modify.pdb > 3bwm_h_modify.probe

Then generate the active site RIN from the probe contacts with probe2rins.py

# Example usage of probe2rins:
python3 $HOME/git/RINRUS/bin/probe2rins.py -f 3bwm_h_modify.probe -s A:300,A:301,A:302

# All arguments for probe2rins:
-f FILE   probe contacts file
-s SEED   seed fragment(s) (e.g. A:300,A:301,A:302)

This produces freq_per_res.dat, rin_list.dat, res_atoms.dat, and *.sif.

Use res_atoms.dat as the input for the trimming procedure in part 3.

NOTE: Remember to replace the metal atom in the PDB before continuing/use the unmodified PDB for the remaining steps if it was replaced with O in the preprocessing.

2B. Using Arpeggio contact count or contact type ranking

Make sure openbabel libraries are available to properly use RINRUS with arpeggio.

First, run arpeggio to generate the contact file (you need to make sure config.py is in the same directory as arpeggio.py)

# Example usage of arpeggio
python3 ~/git/RINRUS/bin/arpeggio/arpeggio.py 3bwm_h.pdb

Then generate the active site RIN from arpeggio contacts using arpeggio2rins.py

# Example usage of arpeggio2rins
python3 ~/git/RINRUS/bin/arpeggio2rins.py -f 3bwm_h.contacts -s A:300,A:301,A:302

# All arguments for arpeggio2rins:
-f FILE   arpeggio contacts file
-s SEED   seed fragment(s) (A:300,A:301,A:302)

This produces the files contact_counts.dat, contype_counts.dat and, node_info.dat. Both contact_counts.dat and contype_counts.dat have the same format as res_atoms.dat.

Use contact_counts.dat (residues ranked by number of contacts) or contype_counts.dat (residues ranked by number of interaction types) as the input for the trimming procedure in part 3.

2C. Using distance ranking

There are two key options that determine how RINRUS calculates the distance between residues and the seed for distance-based selection and ranking.

Seed centre: centre of mass or the average of the Cartesian coordinates.
Hydrogens: distances can be calculated using all atoms or only heavy atoms (no hydrogens)

Use pdb_dist_rank.py to select all fragments with any atoms within a cutoff radius of the seed centre. The seed centre can be determined from only a few atoms instead of the full fragments by using the "-satom" flag instead of "-s".

# Example usage of pdb_dist_rank selecting residues within 5A of the full seed COM
python3 ~/git/RINRUS/bin/pdb_dist_rank.py -pdb 3bwm_h.pdb -s A:300,A:301,A:302 -cut 5 -type mass

# All arguments for pdb_dist_rank
-type CENTRE  how to determine centre of seed ("mass" or "avg")
-nohydro      ignore hydrogen atoms (true if flag present)
-cut COFF     cut off distance (default: 3 Å)
-s SEED       seed fragment(s) (e.g. A:300,A:301,A:302)
-satom ATOMS  seed atoms (e.g. A:301:C8,A:301:N9,A:302:C1,A:302:N1)

This produces the files res_atom-[COFF].dat, res_atom-[COFF]_new.dat, res_atom-[COFF]_new.csv and data_analysis_output_[COFF].csv. The latter 3 files separate out the side chain atoms and main chain atoms of each residue. This is currently not an ideal functional group-based partitioning and an updated workflow is in development.

Use res_atom-[COFF].dat as the input for the trimming procedure in part 3.

2D. Manual selection and ranking

You can generate your own res_atoms.dat file using an existing res_atoms.dat file as a template or from scratch.

The first two columns should list the chain and residue ID of a given residue
The third column should contain an arbitrary ranking value
The rest of the line should be the selected atom(s).

The residues should be listed in the order you want them to be added to the model.

Example format:

A    300      554      O   
A    40       491      CB   CE   CG   HB2  HE1  HE3  HG3  O    SD  
A    141      478      CB   CG   HB3  O    OD1  OD2 
A    143      378      CD1  CD2  CE2  CE3  CG   CH2  CZ2  CZ3  H    NE1 
A    91       335      CB   CG2  H    HB   HG21 HG22 N   
A    170      304      CG   HD21 ND2  OD1

Use your new res_atoms.dat as the input for the trimming procedure in part 3.

2E/2F. Combinatorial model building from probe and arpeggio

These instructions have not been updated/checked recently. Ignore for now...

Combinatorial model building from arpeggio

Generate arpeggio contact files as in section 2B to compute the combinations

python3 ~/git/RINRUS/bin/arpeggio/arpeggio.py 2cht_h-TS.pdb -s /A/203/

After generating the arpeggio contacts, the contact file is used for computing the combinations

Run combifromcontacts.py script with the defined seed (chain/residue numbers) which takes combinations of the different interactions

python3 ~/git/RINRUS/combi_script/combifromcontacts.py 2cht_h.contacts A/203 2cht_h.sift

This generates LongCombi.dat, SimpCombi.dat, and ModSimpCombi.dat

Run genmodelfiles.py to remove redundant models ModSimpCombi.dat file

python3 ~/git/RINRUS/combi_script/genmodelfiles.py ModSimpCombi.dat

Use res_atoms_.dat files generated to prepare modified list of models. The res_atoms_.dat files are then translated into their corresponding PDB files

ls res_atoms_*.dat > list

Open the list and remove "res_atoms_" and ".dat" to leave only the model numbers within list, run the command below after you

for i in `cat list`; do python3 ~/git/RINRUS/bin/rinrus_trim2_pdb.py -pdb 2cht_h.pdb -s A:203 -ratom res_atoms_${i}.dat; mv res_*_atom_info.dat atom_info_${i}.dat; mv res_*_froz_info.dat froz_info_${i}.dat; mv res_*.pdb model_${i}.pdb; done

To identify which pdbs are identical, create a new list of the various model pdb names and run the identifiles.py script

ls model_*.pdb > list

python3 ~/git/RINRUS/combi_script/identifiles.py list

The generated file (UniqueModels.dat) lists all the unique models and removes redundant models

Complete the valences by adding protons to severed bonds and waters via PyMol

for i in `cat list`; do python3 ~/git/RINRUS/bin/pymol_scripts.py ${i} 203; ~qcheng1/bin/pymol -qc log.pml; done

Combinatorial model building from probe

1. Refer to section 2A to generate probe files to compute combinations

Run gen-probe-combi.py script with the defined seed (chain/residue numbers/atom(s)) which takes combinations of the different interactions

python3 ~/git/RINRUS/git/combi_script/gen-probe-combi.py -f 2cht_h_ac_aligned.probe -seed A/203/C1,O1,C2,O2,C3,O3,C4,O4,C5,O5,C6,O7,C8,C9, C10,C11,HO5,H01,H02,H03,H04,H05,H06,H07

Note: Multiple seed indices can be indicated by space separation as A/202 A/202

Run the next step which combine multiple step but remember to edit the necessary part

ls -lrt| grep -v slurm |awk '{print $9}'|grep -E res_atoms_|cut -c 11-12|cut -d. -f1>list; mkdir pdbs; for i in `cat list`; do mkdir model-${i}-01; cd model-${i}-01; mv ../res_atoms_${i}.dat .; python3 ~/git/RINRUS/bin/rinrus_trim2_pdb.py -s A:203 -ratom res_atoms_${i}.dat -pdb ../2cht_h_ac_aligned.pdb; python3 ~/git/RINRUS/bin/pymol_scripts.py -ignore_ids 203 -pdbfilename *.pdb; cp *_h.pdb model-${i}_h.pdb; cp model-${i}_h.pdb ../pdbs/; cp res_atoms_${i}.dat ../pdbs/${i}.dat ; cd ..; done

Be aware that while the res_atoms_#.dat model sets generated are technically unique to each other, once RINRUS generates the full trimmed QM-models, a lot of the them will no longer be unique and will be identical to others. So there a need to check to determine which QM-models are still unique (and not redundant)

3. Trimming and capping the models

Trimming the models

Use rinrus_trim2_pdb.py to generate trimmed cluster models based on the atoms/residues listed in res_atoms.dat (or an equivalent file). If any atom from a functional group (amino acid side chain, peptide bond, water molecule, etc) is included in res_atoms.dat, the entire functional group will be included; the trimming algorithm only truncates the model at alpha carbons.

# Example usage of rinrus_trim2_pdb using a probe "res_atoms.dat" file
python3 ~/git/RINRUS/bin/rinrus_trim2_pdb.py -pdb 3bwm_h.pdb -s A:300,A:301,A:302

# Example usage of rinrus_trim2_pdb using an arpeggio "contact_counts.dat" file and making only the maximal model
python3 ~/git/RINRUS/bin/rinrus_trim2_pdb.py -pdb 3bwm_h.pdb -s A:300,A:301,A:302 -c contact_counts.dat -model max

# All arguments for rinrus_trim2_pdb
-pdb FILE       pre-processed PDB file (e.g. 3bwm_h.pdb)
-s SEED         seed fragment(s)) (e.g. A:300,A:301,A:302)
-c FILE         atom info file (default: res_atoms.dat)
-unfrozen ATS   residues to unfreeze (Chain:resID:<CA/CB/CACB>)
-model N        specify which model to create (number or 'max')
-mustadd	fragment(s) that must be in model (Chain:resID:<S/N/C/S+N/S+C/N+C/S+N+C>)

If no model is specified, the script will generate the entire "ladder" of possible models by adding residues based on their order in res_atoms.dat, otherwise only the model containing N residues will be created (or the maximal model if 'max' is specified). For each model, the files res_N.pdb, res_N_froz_info.dat and res_N_atom_info.dat are created.

If canonical residues are included in the seed, you can unfreeze CA and/or CB in these residues with the unfrozen flag. Specify the carbon atoms to unfreeze as CA, CB or CACB.

The mustadd flag is used to define groups that must be in the model but are not part of the seed (e.g. metal coordinating residues or groups covalently bonded to the seed). These are included independently of what is in res_atoms.dat so the model numbers might double count some groups.

Capping the truncation sites with hydrogens

Use pymol_protonate.py to add capping hydrogens to each res_N.pdb file where bonds were broken in the trimming procedure. You must have a local copy of PyMOL installed! The script generates a log.pml input file containing commands that perform the hydrogen addition and then runs it in PyMOL.

# Example usage of pymol_protonate
python3 $HOME/git/RINRUS/bin/pymol_protonate.py -pdb res_N.pdb -ignore_ids "A:300,A:301,A:302" 

# All arguments for pymol_scripts
-pdb FILE               pdb of model to be protonated
-ignore_ids CH:ID       fragment(s) to exclude from protonation (e.g. all seed fragments)
-ignore_atoms CH:ID:AT  specific atom(s) to exclude from protonation (e.g. A:25:C+O,A:25:N) 
-ignore_atnames ATOMS   atom type(s) to exclude (e.g. "ND1,NE2" to avoid changing histidine protonation. Be very careful with this!)

The "ignore_ids", "ignore_atoms" and "ignore_atnames" flags are used to specify residue IDs, specific atoms and atom types that should not be protonated. You will most likely want to put the seed fragments in the "ignore_ids" list, otherwise PyMOL might reprotonate your noncanonical amino acids/substrate molecules (and make very poor decisions). By default the script avoids re-protonating the end nitrogens in arginine (NH1 and NH2). If you use "ignore_ats" to avoid any other atom types generally (for example the nitrogens in histidine side chains), check your models carefully before proceeding!

4. Generating input files

Use write_input.py to generate input files for quantum chemistry packages. Currently input files can be created for Gaussian, xTB (through Gaussian), ORCA and Q-Chem. Some of the options for this script are specific to our QM-cluster modelling workflow and may not be relevant to other users.

Basic usage of write_input.py to create an input file to do a basic geometry optimisation + frequency calculation with a given cluster model res_N_h.pdb:

# Example usage of write_input: Gaussian
python3 $HOME/git/RINRUS/bin/write_input.py -pdb res_N_h.pdb -c 2 -format gaussian

# Example usage of write_input: xTB (in Gaussian)
python3 $HOME/git/RINRUS/bin/write_input.py -pdb res_N_h.pdb -c 2 -format gau-xtb

# Example usage of write_input: ORCA
python3 $HOME/git/RINRUS/bin/write_input.py -pdb res_N_h.pdb -c 2 -format orca

# Example usage of write_input: Q-Chem
python3 $HOME/git/RINRUS/bin/write_input.py -pdb res_N_h.pdb -c 2 -format qchem

# Useful arguments for basic usage of write_input:
-pdb FILE           pdb file to use as structure in input file
-m MULT             model multiplicity
-c LIGCHRG          ligand charge
-format PROG        software package (gaussian/gau-xtb/orca/qchem)
-intmp FILE         input template for selected software package (if none specified, the ones in $HOME/git/RINRUS/template_files/ are used)
-inpn NAME          name of input file (default: 1.inp)
-basisinfo 'intmp' (Gaussian only) use basis set info from the input template file instead of the library in $HOME/git/RINRUS/lib3/gaussian_basis_dict.py

Full set of arguments for using write_input.py:

# General arguments:
-m MULT            model multiplicity
-c LIGCHRG         ligand charge
-format PROG       software package (gaussian/gau-xtb/orca/qchem/psi4-fsapt)
-intmp FILE        input template for selected software package (defaults to the ones in $HOME/git/RINRUS/template_files/ if none specified)
-inpn NAME         name of input file (default: 1.inp or input.dat for psi4-fsapt)
-basisinfo 'intmp' (Gaussian only) use basis set info from the input template file instead of default RINRUS basis set library
-wdir PATH         working directory
-type TYPE         type of structure processing for input file:
                   'pdb': use pdb file normally (default)
                   'hopt': freeze all heavy atoms so only hydrogen atoms optimized
                   'gauout': take structure from Gaussian output
                   'replacecoords': update selected atom coords e.g. to create TS guess

# If using -type 'pdb' (default if no type specified), these are required:
-pdb FILE          pdb file

# If using -type 'hopt', these are required:
-noh FILE          cluster model pdb file before H added (e.g. res_N.pdb)
-adh FILE          cluster model pdb file after H added (e.g. res_N_h.pdb)

# If using -type 'gauout', these are required:
-tmp FILE          template pdb file for creating new pdb from output
-outf FILE         Gaussian output file
-ckp FILE          Gaussian checkpoint file

# If using -type 'replacecoords', these are required:
-pdb1 FILE         starting pdb file
-pdb2 FILE         pdb to take coordinates from
-parts FILE        text file containing species in pdb1 to be replaced with pdb2 coordinates

# If using -format 'psi4-fsapt', these are required:
-seed SEED         seed to define as fragment A, rest of enzyme will be fragment B

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

How to make QM-cluster models and input files with RINRUS

Using the driver

Input file

Log file

Step by step usage of RINRUS

1. Preparing structure files

2. Selecting the active site/generating the active site RIN

3. Trimming and capping the models

Trimming the models

Capping the truncation sites with hydrogens

4. Generating input files

Files

README.md

Latest commit

History

README.md

File metadata and controls

How to make QM-cluster models and input files with RINRUS

Using the driver

Input file

Log file

Step by step usage of RINRUS

1. Preparing structure files

2. Selecting the active site/generating the active site RIN

3. Trimming and capping the models

Trimming the models

Capping the truncation sites with hydrogens

4. Generating input files