Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to deal with Explicit valence for atom #10

Open
pritam0070 opened this issue Nov 28, 2024 · 4 comments
Open

how to deal with Explicit valence for atom #10

pritam0070 opened this issue Nov 28, 2024 · 4 comments

Comments

@pritam0070
Copy link

qsprpred - ERROR - AtomValenceException('Explicit valence for atom # 0 C greater than permitted')
multiprocessing.pool.RemoteTraceback.
I also use RdKit for
Chem.Cleanup(mol) # Cleanup
Chem.SanitizeMol(mol) # Validate but stil it giving same error
It happening in tutorial/qsar.ipynb

@martin-sicho
Copy link
Contributor

Hi, it is strange this is giving you trouble. Can you share the structure of the molecule that may be causing it? Or the data set? If not, the full traceback would also help. I think this might be something on the QSPRpred side, which applies standardization by default for certain operations. I am guessing this method is the culprit, but I would need the traceback from the notebook to be sure.

@martin-sicho
Copy link
Contributor

Also, there was a new release in the meantime: https://github.com/CDDLeiden/DrugEx/releases/tag/v3.4.8, which I do not think affects your case, but you might want to update since it fixes other issues.

@pritam0070
Copy link
Author

Hi, it is strange this is giving you trouble. Can you share the structure of the molecule that may be causing it? Or the data set? If not, the full traceback would also help. I think this might be something on the QSPRpred side, which applies standardization by default for certain operations. I am guessing this method is the culprit, but I would need the traceback from the notebook to be sure.

I prepare a dataset using create_tutorial_data.py of QSPRpred of this P00533 protein and I do all standardization using RDkit but the same error persist. As you ask I share the file to you. If you able to solve that issue please share with me solution methods.
EGFR_LIGANDS.tsv.zip

@martin-sicho
Copy link
Contributor

Hi @pritam0070,

many thanks for the detailed information. I was able to reproduce the problem and it is an issue with the ScaffoldSplit. It seems that the scaffold perception algorithm failed on one or more compounds, which is not handled properly in QSPRpred at the moment. I will work on a fix on the QSPRpred side, but if you do not require to perform a scaffold split per se, you can do a ClusterSplit instead, which even should be slightly more challenging. This is how you would use it in the qsar.ipynb notebook:

from qsprpred.data.sampling.splits import ClusterSplit

split = ClusterSplit(test_fraction=0.2)

# 
dataset.prepareDataset(
    split=split,
    feature_calculators=[MorganFP(radius=3, nBits=2048)]
)

I hope this helps and I will update you on the progress of fixing the scaffold split issue if it is something you are specifically interested in using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants