Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not have files for running make_types.py when prerparing custom data for training a new classifier #26

Open
mainguyenanhvu opened this issue Apr 25, 2023 · 3 comments

Comments

@mainguyenanhvu
Copy link

mainguyenanhvu commented Apr 25, 2023

I am trying to use your instruction to prepare data for training a new classifier.
I have stuck in make_types step because I can't find train.txt and test.txt files.

Moreover, I have 4 questions:

  1. If I want to add several pdb files to the available scPDB dataset, how can I complete it?
  2. Your instruction for preparing data only works for a single pdb file, does it? If not, I need to write a pipeline to wrap up it.
  3. How to prepare train.txt and test.txt files to run make_types.py?
  4. Could you please show me which file/folder needed inputting from previous to each step?

I am tried on this pdb.

Thank you very much.

@mainguyenanhvu mainguyenanhvu changed the title Do not have files for make_types when preparing custom data for training a new classifier Do not have files for running make_types.py when prerparing custom data for training a new classifier Apr 26, 2023
@RishalAggarwal
Copy link
Collaborator

Hey, thanks for your interest.

  1. To add more files to the scpdb dataset you will have to create new types/molcache files for training
  2. The first 4 steps are for a single pdb yes.
  3. I believe the train.txt and test.txt files just need to contain the protein-ligand complexes you are training/testing on.
  4. I'm not sure what you mean here, but you need to ensure (unfortunately in the scripts) that all the file paths are valid.

Let me know if you are facing any errors in the process.

@mainguyenanhvu
Copy link
Author

mainguyenanhvu commented May 1, 2023

Thank for your reply. I added a code to deploy data preparation pipeline, and made a pull request. If you have time, please check it whether or not I might make some mistakes.

To reply your answer:
3. I understand that it contains names of pdb files I would like to use to generate database.
4. After I read your code, I understood input and output for each step.

Besides, I would like to ask you:

  1. In https://github.com/devalab/DeepPocket/blob/main/make_types.py, you use a _ligand.sdf file. I would like to know what this file contains and how to create it. I tried to create it from orignal pdb file by only exporting ligand. Could you please send me some pairs of pdb and sdf file, so I can understand easily?
  2. If bary_centers.txt is none after running get_centers.py, how can I fix it?
  3. I would like to have your prepared dataset (scPDB, ...), could you please send me?
  4. When I run types_gninatyper.py, it warns these information, how can I fix it?
==============================
*** Open Babel Warning  in parseAtomRecord
  WARNING: Problems reading a PDB file
  Problems reading a HETATM or ATOM record.
  According to the PDB specification,
  columns 79-80 should contain charge of the atom
  but OpenBabel found ' 0' (atom 5489).
==============================
*** Open Babel Warning  in parseAtomRecord
  WARNING: Problems reading a PDB file
  Problems reading a HETATM or ATOM record.
  According to the PDB specification,
  columns 79-80 should contain charge of the atom
  but OpenBabel found ' 0' (atom 5490).

Please help me. Thank you very much.

@RishalAggarwal
Copy link
Collaborator

RishalAggarwal commented May 17, 2023

Thank you for the pull request, I will check it when I get more time.

  1. yes the *_ligand.sdf file contains only the ligand coordinates - you can extract them as the hetatom records from any pdb file (be sure not to include waters)
  2. If barycenter.txt is empty, it probably means fpocket did not identify a pocket in that protein, since deeppocket is dependent on pockets found by fpocket, theres no fix for this.
  3. Datasets are available in the link provided on the README.md, is there anything in particular you are looking for?
  4. The warning is due to openbabel but safe to ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants