Do not have files for running make_types.py when prerparing custom data for training a new classifier #26

mainguyenanhvu · 2023-04-25T14:53:01Z

I am trying to use your instruction to prepare data for training a new classifier.
I have stuck in make_types step because I can't find train.txt and test.txt files.

Moreover, I have 4 questions:

If I want to add several pdb files to the available scPDB dataset, how can I complete it?
Your instruction for preparing data only works for a single pdb file, does it? If not, I need to write a pipeline to wrap up it.
How to prepare train.txt and test.txt files to run make_types.py?
Could you please show me which file/folder needed inputting from previous to each step?

I am tried on this pdb.

Thank you very much.

RishalAggarwal · 2023-05-01T14:25:38Z

Hey, thanks for your interest.

To add more files to the scpdb dataset you will have to create new types/molcache files for training
The first 4 steps are for a single pdb yes.
I believe the train.txt and test.txt files just need to contain the protein-ligand complexes you are training/testing on.
I'm not sure what you mean here, but you need to ensure (unfortunately in the scripts) that all the file paths are valid.

Let me know if you are facing any errors in the process.

mainguyenanhvu · 2023-05-01T15:15:28Z

Thank for your reply. I added a code to deploy data preparation pipeline, and made a pull request. If you have time, please check it whether or not I might make some mistakes.

To reply your answer:
3. I understand that it contains names of pdb files I would like to use to generate database.
4. After I read your code, I understood input and output for each step.

Besides, I would like to ask you:

In https://github.com/devalab/DeepPocket/blob/main/make_types.py, you use a _ligand.sdf file. I would like to know what this file contains and how to create it. I tried to create it from orignal pdb file by only exporting ligand. Could you please send me some pairs of pdb and sdf file, so I can understand easily?
If bary_centers.txt is none after running get_centers.py, how can I fix it?
I would like to have your prepared dataset (scPDB, ...), could you please send me?
When I run types_gninatyper.py, it warns these information, how can I fix it?

==============================
*** Open Babel Warning  in parseAtomRecord
  WARNING: Problems reading a PDB file
  Problems reading a HETATM or ATOM record.
  According to the PDB specification,
  columns 79-80 should contain charge of the atom
  but OpenBabel found ' 0' (atom 5489).
==============================
*** Open Babel Warning  in parseAtomRecord
  WARNING: Problems reading a PDB file
  Problems reading a HETATM or ATOM record.
  According to the PDB specification,
  columns 79-80 should contain charge of the atom
  but OpenBabel found ' 0' (atom 5490).

Please help me. Thank you very much.

RishalAggarwal · 2023-05-17T15:49:21Z

Thank you for the pull request, I will check it when I get more time.

yes the *_ligand.sdf file contains only the ligand coordinates - you can extract them as the hetatom records from any pdb file (be sure not to include waters)
If barycenter.txt is empty, it probably means fpocket did not identify a pocket in that protein, since deeppocket is dependent on pockets found by fpocket, theres no fix for this.
Datasets are available in the link provided on the README.md, is there anything in particular you are looking for?
The warning is due to openbabel but safe to ignore.

mainguyenanhvu changed the title ~~Do not have files for make_types when preparing custom data for training a new classifier~~ Do not have files for running make_types.py when prerparing custom data for training a new classifier Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not have files for running make_types.py when prerparing custom data for training a new classifier #26

Do not have files for running make_types.py when prerparing custom data for training a new classifier #26

mainguyenanhvu commented Apr 25, 2023 •

edited

Loading

RishalAggarwal commented May 1, 2023

mainguyenanhvu commented May 1, 2023 •

edited

Loading

RishalAggarwal commented May 17, 2023 •

edited

Loading

Do not have files for running make_types.py when prerparing custom data for training a new classifier #26

Do not have files for running make_types.py when prerparing custom data for training a new classifier #26

Comments

mainguyenanhvu commented Apr 25, 2023 • edited Loading

RishalAggarwal commented May 1, 2023

mainguyenanhvu commented May 1, 2023 • edited Loading

RishalAggarwal commented May 17, 2023 • edited Loading

mainguyenanhvu commented Apr 25, 2023 •

edited

Loading

mainguyenanhvu commented May 1, 2023 •

edited

Loading

RishalAggarwal commented May 17, 2023 •

edited

Loading