Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Feng Bash Scripts and Slurm Scripts for NERSC #2

Merged
merged 18 commits into from
Mar 13, 2024
Merged
Changes from 1 commit
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
126cc01
Added bash script for processing CoDNaS database
smallfishabc Jan 13, 2024
4b5c00c
Added Slurm script for processing PDB70 dataset on NERSC
smallfishabc Jan 25, 2024
8b073d6
Delete .DS_Store
smallfishabc Jan 25, 2024
f434fd5
Merge branch 'main' of https://github.com/lbl-cbg/metfish
smallfishabc Jan 25, 2024
0fb0993
Update PDB70_NERSC/parallel_PDB70.slurm
smallfishabc Jan 25, 2024
099284e
Added a scripts folder to store individual analyze scripts
smallfishabc Jan 25, 2024
858e407
Added a scripts folder to store individual analyze scripts
smallfishabc Jan 25, 2024
228975d
Merge branch 'add_feng_script' of https://github.com/lbl-cbg/metfish …
smallfishabc Jan 25, 2024
37cf75e
Merge branch 'main' into add_feng_script
smallfishabc Jan 25, 2024
fe1132c
Added PDBtoSeq as a function to utils.py and a command to commands.py
smallfishabc Jan 26, 2024
70a1138
Update scripts/PDB70_NERSC/fixpdb_pr_parallel_v2.bash
smallfishabc Jan 31, 2024
a8607f1
Update scripts/PDB70_NERSC/fixpdb_pr_parallel_v2.bash
smallfishabc Jan 31, 2024
b819564
add .DS_store to gitignore
smallfishabc Feb 23, 2024
e7f6003
Retrive full sequence from RCSB to fix the conformation of PDBs.
smallfishabc Mar 1, 2024
d88c6c3
replace the old bash script with new script to fix the conformations.
smallfishabc Mar 1, 2024
74dfc69
Change the scripts to the actual scripts running on NERSC
smallfishabc Mar 6, 2024
5169f54
Added README file for NERSC scripts.
smallfishabc Mar 6, 2024
6ab54cc
Added the github url for the modified pdbfixer
smallfishabc Mar 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Added PDBtoSeq as a function to utils.py and a command to commands.py
smallfishabc committed Jan 26, 2024
commit fe1132c04da723fbedb135bd75f897caeb381b5d
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -37,6 +37,7 @@ dynamic = ["version"]

[project.scripts]
calc-pr = "metfish.commands:get_Pr_cli"
extract-seq = "metfish.commands:extract_seq_cli"

[tool.ruff]
# Exclude a variety of commonly ignored directories.
4 changes: 2 additions & 2 deletions scripts/PDB70_NERSC/PDBtoSeq.py
Original file line number Diff line number Diff line change
@@ -20,8 +20,8 @@ def main():
description = '''Extract Sequence from PDB using the BioPython API.
The output will be stored at the current directory as a fasta file.''' )

parser.add_argument('-f', '--filename', required=True)
parser.add_argument('-o', '--output', default='./')
parser.add_argument('-f', '--filename', required=True, help='input pdb file name')
parser.add_argument('-o', '--output', required=True, help='output fasta file name')

args = parser.parse_args()

14 changes: 14 additions & 0 deletions src/metfish/commands.py
Original file line number Diff line number Diff line change
@@ -5,6 +5,7 @@
import pandas as pd

from .utils import get_Pr
from .utils import extract_seq

def get_Pr_cli(argv=None):

@@ -40,3 +41,16 @@ def get_Pr_cli(argv=None):
step=args.step)

pd.DataFrame({"r": r, "P(r)": p}).to_csv(out, index=False)

def extract_seq_cli(argv=None):

parser = argparse.ArgumentParser(
description = '''Extract Sequence from PDB using the BioPython API.
The output will be stored at the current directory as a fasta file.''' )

parser.add_argument('-f', '--filename', required=True, help="input pdb file")
parser.add_argument('-o', '--output', required=True, help="output fasta file path + name")

args = parser.parse_args()

extract_seq(args.filename,args.output)
24 changes: 24 additions & 0 deletions src/metfish/utils.py
Original file line number Diff line number Diff line change
@@ -6,6 +6,8 @@
import numpy as np
from periodictable import elements
from scipy.spatial.distance import pdist, squareform
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord


n_elec_df = {el.symbol: el.number for el in elements}
@@ -69,3 +71,25 @@
p = np.concatenate(([0], hist / hist.sum()))

return r, p

def extract_seq(pdb_input,output_path):
"""
Args:
pdb_input : The path to the PDB to extract sequence.
The PDB must only contain a single chain.
pdbfixer is a good tool to prepare PDB for this function
output_path : The path to store the output fasta file.
Example: /location/to/store/PDB.fasta
Returns:
There is no return for this function. The sequence will be written
as a fasta file in the give location.
"""
pdb_name=os.path.basename(pdb_input).split(".")[0]
counter=1
for record in SeqIO.parse(pdb_input,"pdb-atom"):
if counter > 1:
raise ValueError("More than 1 Chain is in the file {}".format(pdb_input))

Check warning on line 91 in src/metfish/utils.py

Codecov / codecov/patch

src/metfish/utils.py#L87-L91

Added lines #L87 - L91 were not covered by tests
else:
new_seq_record = SeqRecord(record.seq, id=pdb_name, description='')
SeqIO.write(new_seq_record, output_path ,"fasta")
counter+=1

Check warning on line 95 in src/metfish/utils.py

Codecov / codecov/patch

src/metfish/utils.py#L93-L95

Added lines #L93 - L95 were not covered by tests