From a6a06693c011f842095104c88b8f5672cf3c7e9a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jo=C3=A3o=20Capela?= Date: Thu, 23 Jan 2025 19:21:57 +0000 Subject: [PATCH] [FIX] readme --- README.md | 197 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 104 insertions(+), 93 deletions(-) diff --git a/README.md b/README.md index 38e23c7..c78e4ee 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,107 @@ +# Table of contents: + +- [Using DeepMol models](#using-deepmol-models) + - [How to use](#how-to-use) + - [Visualization](#visualization) +- [Case studies for the DeepMol publication](#case-studies-for-the-deepmol-publication) + - [AutoML experiments - TDC Commons](#automl-experiments---tdc-commons) + - [Comparison of DeepMol and QSARTuna](#comparison-of-deepmol-and-qsartuna) + - [Benchmark computational resources and runtimes of DeepMol](#benchmark-computational-resources-and-runtimes-of-deepmol) + - [Evaluate TDC commons benchmark datasets](#evaluate-tdc-commons-benchmark-datasets) + +# Using DeepMol models + +Models available so far: + +| Model Name | How to Call | Prediction Type | +|---------------------------------------------|---------------------------------|----------------------------------------------------------------| +| BBB (Blood-Brain Barrier) | `BBB` | Penetrates BBB (1) or does not penetrate BBB (0) | +| AMES Mutagenicity | `AMES` | Mutagenic (1) or not mutagenic (0) | +| Human plasma protein binding rate (PPBR) | `PPBR` | Rate of PPBR expressed in percentage | +| Volume of Distribution (VD) at steady state | `VDss` | Volume of Distribution expressed in liters per kilogram (L/kg)| +| Caco-2 (Cell Effective Permeability) | `Caco2` | Cell Effective Permeability (cm/s) | +| HIA (Human Intestinal Absorption) | `HIA` | Absorbed (1) or not absorbed (0) | +| Bioavailability | `Bioavailability` | Bioavailable (1) or not bioavailable (0) | +| Lipophilicity | `Lipophilicity` | Lipophilicity log-ratio | +| Solubility | `Solubility` | Solubility (log mol/L) | +| CYP P450 2C9 Inhibition | `CYP2C9Inhibition` | Inhibit (1) or does not inhibit (0) | +| CYP P450 3A4 Inhibition | `CYP3A4Inhibition` | Inhibit (1) or does not inhibit (0) | +| CYP2C9 Substrate | `CYP2C9Substrate`| Metabolized (1) or does not metabolize (0) | +| CYP2D6 Substrate | `CYP2D6Substrate`| Metabolized (1) or does not metabolize (0) | +| CYP3A4 Substrate | `CYP3A4Substrate`| Metabolized (1) or does not metabolize (0) | +| Hepatocyte Clearance | `HepatocyteClearance` | Drug hepatocyte clearance (uL.min-1.(10^6 cells)-1) | +| NPClassifier | `NPClassifier` | Pathway, Superclass, Class | +| Plants secondary metabolite precursors predictor | `PlantsSMPrecursorPredictor` | Precursor 1; Precursor 2 | +| Microsome Clearance | `MicrosomeClearance` | Drug microsome clearance (mL.min-1.g-1) | +| LD50 | `LD50` | LD50 (log(1/(mol/kg))) | +| hERG Blockers | `hERGBlockers` | hERG blocker (1) or not blocker (0) | + +## How to use: + +You can use them either individually or mixed together. + +You can call one model individually, pass a CSV file and get the results in one dataframe: + +```python +from deepmol_models import BBB +results = BBB().predict_from_csv("dataset.csv", smiles_field="Drug", id_field="Drug_ID", output_file="predictions.csv") +results +``` + +| ID | SMILES | BBB Penetration | +|----|-------------------------------------------------|-----------------| +| 0 | OCC(S)CS | 1.0 | +| 1 | CC[N+](C)(C)c1cccc(O)c1 | 0.0 | +| 2 | Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O | 1.0 | +| 3 | CC(=O)OCC1=C(C(=O)O)N2C(=O)[C@@H](NC(=O)CC#N)[...| 0.0 | +| 4 | CC1(C)S[C@@H]2[C@H](NC(=O)[C@H](C(=O)O)c3ccsc3...| 0.0 | + + +Or pass SMILES strings and get the results in one dataframe: + +```python +from deepmol_models import BBB +results = BBB().predict_from_csv("dataset.csv", smiles_field="Drug", id_field="Drug_ID", output_file="predictions.csv") +results +``` + +| ID | SMILES | BBB Penetration | +|----|-------------------------------------------------|-----------------| +| 1 | CC[N+](C)(C)c1cccc(O)c1 | 0.0 | +| 2 | Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O | 1.0 | + +Complementarily, you can run several models: + +```python +from deepmol_models import BBB, PPBR, VDss, Caco2, HIA, Bioavailability, \ + Lipophilicity, Solubility, PlantsSMPrecursorPredictor, NPClassifier, MixedPredictor + +# results = MixedPredictor([BBB(), Caco2(), CYP2D6Inhibition(), NPClassifier()]).predict_from_csv("test_molecules.csv", "Drug", "Drug_ID", output_file="predictions.csv") +results = MixedPredictor([BBB(), PPBR(), VDss(), Caco2(), + HIA(), Bioavailability(), Lipophilicity(), + Solubility(), PlantsSMPrecursorPredictor(), NPClassifier()]).predict_from_csv("test_molecules.csv", smiles_field="Drug", id_field="Drug_ID", output_file="predictions.csv") +results +``` + +| ID | SMILES | BBB Penetration | Human PPBR | VDss | Cell Effective Permeability | Human Intestinal Absorption | Bioavailability | Lipophilicity | Solubility | Precursors | Pathways | Superclass | Class | +|----|-------------------------------------------------|-----------------|------------|----------|-----------------------------|-----------------------------|-----------------|---------------|------------|---------------------------------------|--------------------------|-----------------------|-----------------------| +| 0 | OCC(S)CS | 1.0 | 64.832665 | 5.803529 | -4.725497 | 0.0 | 0.0 | 0.290602 | 0.117866 | | Fatty acyls | Fatty alcohols | Fatty alcohols | +| 1 | CC[N+](C)(C)c1cccc(O)c1 | 0.0 | 32.882912 | 2.891243 | -4.989814 | 0.0 | 0.0 | 0.271549 | -0.606772 | L-Lysine | Alkaloids | Tyrosine alkaloids | Phenylethylamines | +| 2 | Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O | 1.0 | 41.540812 | 2.722000 | -6.059630 | 1.0 | 0.0 | 0.250526 | -1.701435 | Dimethylallyl diphosphate | Carbohydrates | Nucleosides | Purine nucleosides | +| 3 | CC(=O)OCC1=C(C(=O)O)N2C(=O)[C@@H](NC(=O)CC#N)[...] | 0.0 | 52.921825 | 0.329071 | -5.473660 | 0.0 | 0.0 | 0.267842 | -2.512460 | Geranylgeranyl diphosphate; L-Alanine | Amino acids and Peptides | β-lactams | Cephalosporins | + + +## Visualization + +You can use our API to access the bokeh representation of the chemical space and check some features of the molecules: + +```python +from deepmol_models import bokeh_plot +bokeh_plot(results, "Solubility", additional_labels=["Pathways", "Superclass", "Class"]) +``` + +![bokeh](https://github.com/BioSystemsUM/deepmol_case_studies/blob/main/example_bokeh.gif) + # Case studies for the DeepMol publication ## Installation @@ -120,96 +224,3 @@ from dcs.evaluation import get_results results = get_results(tdc_dataset_name="Bioavailability_Ma") ``` - -# Using DeepMol models - -Models available so far: - -| Model Name | How to Call | Prediction Type | -|---------------------------------------------|---------------------------------|----------------------------------------------------------------| -| BBB (Blood-Brain Barrier) | `BBB` | Penetrates BBB (1) or does not penetrate BBB (0) | -| AMES Mutagenicity | `AMES` | Mutagenic (1) or not mutagenic (0) | -| Human plasma protein binding rate (PPBR) | `PPBR` | Rate of PPBR expressed in percentage | -| Volume of Distribution (VD) at steady state | `VDss` | Volume of Distribution expressed in liters per kilogram (L/kg)| -| Caco-2 (Cell Effective Permeability) | `Caco2` | Cell Effective Permeability (cm/s) | -| HIA (Human Intestinal Absorption) | `HIA` | Absorbed (1) or not absorbed (0) | -| Bioavailability | `Bioavailability` | Bioavailable (1) or not bioavailable (0) | -| Lipophilicity | `Lipophilicity` | Lipophilicity log-ratio | -| Solubility | `Solubility` | Solubility (log mol/L) | -| CYP P450 2C9 Inhibition | `CYP2C9Inhibition` | Inhibit (1) or does not inhibit (0) | -| CYP P450 3A4 Inhibition | `CYP3A4Inhibition` | Inhibit (1) or does not inhibit (0) | -| CYP2C9 Substrate | `CYP2C9Substrate`| Metabolized (1) or does not metabolize (0) | -| CYP2D6 Substrate | `CYP2D6Substrate`| Metabolized (1) or does not metabolize (0) | -| CYP3A4 Substrate | `CYP3A4Substrate`| Metabolized (1) or does not metabolize (0) | -| Hepatocyte Clearance | `HepatocyteClearance` | Drug hepatocyte clearance (uL.min-1.(10^6 cells)-1) | -| NPClassifier | `NPClassifier` | Pathway, Superclass, Class | -| Plants secondary metabolite precursors predictor | `PlantsSMPrecursorPredictor` | Precursor 1; Precursor 2 | -| Microsome Clearance | `MicrosomeClearance` | Drug microsome clearance (mL.min-1.g-1) | -| LD50 | `LD50` | LD50 (log(1/(mol/kg))) | -| hERG Blockers | `hERGBlockers` | hERG blocker (1) or not blocker (0) | - -## How to use: - -You can use them either individually or mixed together. - -You can call one model individually, pass a CSV file and get the results in one dataframe: - -```python -from deepmol_models import BBB -results = BBB().predict_from_csv("dataset.csv", smiles_field="Drug", id_field="Drug_ID", output_file="predictions.csv") -results -``` - -| ID | SMILES | BBB Penetration | -|----|-------------------------------------------------|-----------------| -| 0 | OCC(S)CS | 1.0 | -| 1 | CC[N+](C)(C)c1cccc(O)c1 | 0.0 | -| 2 | Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O | 1.0 | -| 3 | CC(=O)OCC1=C(C(=O)O)N2C(=O)[C@@H](NC(=O)CC#N)[...| 0.0 | -| 4 | CC1(C)S[C@@H]2[C@H](NC(=O)[C@H](C(=O)O)c3ccsc3...| 0.0 | - - -Or pass SMILES strings and get the results in one dataframe: - -```python -from deepmol_models import BBB -results = BBB().predict_from_csv("dataset.csv", smiles_field="Drug", id_field="Drug_ID", output_file="predictions.csv") -results -``` - -| ID | SMILES | BBB Penetration | -|----|-------------------------------------------------|-----------------| -| 1 | CC[N+](C)(C)c1cccc(O)c1 | 0.0 | -| 2 | Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O | 1.0 | - -Complementarily, you can run several models: - -```python -from deepmol_models import BBB, PPBR, VDss, Caco2, HIA, Bioavailability, \ - Lipophilicity, Solubility, PlantsSMPrecursorPredictor, NPClassifier, MixedPredictor - -# results = MixedPredictor([BBB(), Caco2(), CYP2D6Inhibition(), NPClassifier()]).predict_from_csv("test_molecules.csv", "Drug", "Drug_ID", output_file="predictions.csv") -results = MixedPredictor([BBB(), PPBR(), VDss(), Caco2(), - HIA(), Bioavailability(), Lipophilicity(), - Solubility(), PlantsSMPrecursorPredictor(), NPClassifier()]).predict_from_csv("test_molecules.csv", smiles_field="Drug", id_field="Drug_ID", output_file="predictions.csv") -results -``` - -| ID | SMILES | BBB Penetration | Human PPBR | VDss | Cell Effective Permeability | Human Intestinal Absorption | Bioavailability | Lipophilicity | Solubility | Precursors | Pathways | Superclass | Class | -|----|-------------------------------------------------|-----------------|------------|----------|-----------------------------|-----------------------------|-----------------|---------------|------------|---------------------------------------|--------------------------|-----------------------|-----------------------| -| 0 | OCC(S)CS | 1.0 | 64.832665 | 5.803529 | -4.725497 | 0.0 | 0.0 | 0.290602 | 0.117866 | | Fatty acyls | Fatty alcohols | Fatty alcohols | -| 1 | CC[N+](C)(C)c1cccc(O)c1 | 0.0 | 32.882912 | 2.891243 | -4.989814 | 0.0 | 0.0 | 0.271549 | -0.606772 | L-Lysine | Alkaloids | Tyrosine alkaloids | Phenylethylamines | -| 2 | Nc1ncnc2c1ncn2[C@@H]1O[C@H](CO)[C@@H](O)[C@@H]1O | 1.0 | 41.540812 | 2.722000 | -6.059630 | 1.0 | 0.0 | 0.250526 | -1.701435 | Dimethylallyl diphosphate | Carbohydrates | Nucleosides | Purine nucleosides | -| 3 | CC(=O)OCC1=C(C(=O)O)N2C(=O)[C@@H](NC(=O)CC#N)[...] | 0.0 | 52.921825 | 0.329071 | -5.473660 | 0.0 | 0.0 | 0.267842 | -2.512460 | Geranylgeranyl diphosphate; L-Alanine | Amino acids and Peptides | β-lactams | Cephalosporins | - - -## Visualization - -You can use our API to access the bokeh representation of the chemical space and check some features of the molecules: - -```python -from deepmol_models import bokeh_plot -bokeh_plot(results, "Solubility", additional_labels=["Pathways", "Superclass", "Class"]) -``` - -![bokeh](example_bokeh.gif)