first pass at charge tutorial updates

OpenFreeEnergy · Feb 6, 2025 · 3e25848 · 3e25848
1 parent cf41975
commit 3e25848
Show file tree

Hide file tree

Showing 3 changed files with 171 additions and 7 deletions.
diff --git a/cli_tutorials/cli_charge_molecules.md b/cli_tutorials/cli_charge_molecules.md
@@ -0,0 +1,103 @@
+# Generating Partial Charges with the OpenFE CLI
+
+This tutorial will show you how to use the OpenFE CLI to generate and store partial charges for a series of ligands 
+which can be used with OpenFE protocols. It is recommended to use a single set of charges for each ligand to ensure reproducibility
+between repeats or consistent charges between different legs of a calculation involving the same ligand, like a relative binding
+affinity calculation for example. As such both the `plan-rbfe-network` and `plan-rhfe-network` commands will calculate 
+partial charges for ligands making it expensive to run multiple network mappings while finding the optimal one for the resources 
+available. Hence, the `charge-molecules` command offers a way to generate and store the charges into an SDF file which 
+can be used with the rest of the OpenFE CLI and python API. 
+
+## Charging Molecules
+
+The `charge-molecules` command allows you to generate partial charges a series of small molecules saved in SDF or MOL2 
+format using the `am1bcc` method calculated using `ambertools`:
+
+```bash
+openfe charge-molecules -M tyk2_ligands.sdf -o charged_tyk2_ligands.sdf
+```
+
+This will result in a new SDF file `charged_tyk2_ligands.sdf` which contains the same ligands and their partial charges 
+stored in a new SD tag like so:
+
+```text
+lig_ejm_42
+     RDKit          3D
+
+ 35 36  0  0  0  0  0  0  0  0999 V2000
+   -4.7651   -2.8327  -16.5085 H   0  0  0  0  0  0  0  0  0  0  0  0
+   -5.3566   -3.6931  -16.2274 C   0  0  0  0  0  0  0  0  0  0  0  0
+   -4.7703   -4.9699  -16.2000 C   0  0  0  0  0  0  0  0  0  0  0  0
+[continues]
+ 28 31  1  0
+M  END
+
+>  <ofe-name>
+lig_ejm_42
+
+>  <atom.dprop.PartialCharge>
+0.14794282857142857 -0.096057171428571425 -0.12905717142857143 ...
+[continues]
+```
+
+Generating partial charges with the `am1bcc` method can be slow as they require a semi-empirical quantum chemical calculation, 
+we can however take advantage of multiprocessing to calculate the charges in parallel for each ligand which offers a 
+significant speed-up. The number of processors available for the workflow can be specified using the `-n` flag:
+
+```bash
+openfe charge-molecules -M tyk2_ligands.sdf -o charged_tyk2_ligands.sdf -n 4
+```
+
+## Customizing the Charge Method
+
+There are a wide range of partial charge generation methods available with `am1bcc` based schemes being most commonly 
+used with OpenFF force fields. The choice of charge scheme can be easily customised by providing a settings file in `.yaml` format. 
+For example to recreate the current default settings in the workflow you would do the following:
+
+1. Provide a file like `settings.yaml` with the desired settings:
+
+```yaml
+partial_charge:
+  method: am1bcc
+  settings:
+    off_toolkit_backend: ambertools
+```
+
+2. Charge the ligands with an additional `-s` flag for passing the settings:
+
+```bash
+openfe charge-molecules -M tyk2_ligands.sdf -o charged_tyk2_ligands.sdf -n 4 -s settings.yaml
+```
+
+3. The output of the CLI program will now reflect the changes made:
+
+```text
+SMALL MOLECULE PARTIAL CHARGE GENERATOR
+_________________________________________
+
+Parsing in Files: 
+	Got input: 
+		Small Molecules: SmallMoleculeComponent(name=lig_ejm_31) SmallMoleculeComponent(name=lig_ejm_42) SmallMoleculeComponent(name=lig_ejm_43) SmallMoleculeComponent(name=lig_ejm_46) SmallMoleculeComponent(name=lig_ejm_47) SmallMoleculeComponent(name=lig_ejm_48) SmallMoleculeComponent(name=lig_ejm_50) SmallMoleculeComponent(name=lig_jmc_23) SmallMoleculeComponent(name=lig_jmc_27) SmallMoleculeComponent(name=lig_jmc_28)
+Using Options:
+	Partial Charge Generation: am1bcc
+```
+
+The full range of partial charge settings can be found in the snipit bellow, note that some may require installing extra packages.
+
+```yaml
+partial_charge:
+  method: am1bcc
+  # method: am1bccelf10
+  # method: espaloma
+  # method: nagl
+  settings:
+    off_toolkit_backend: ambertools
+    # off_toolkit_backend: openeye  # required for the am1bccelf10 method
+    number_of_conformers: null  # null specifies the use of the input conformer, a value requests that a new conformer be generated
+    # nagl_model: null  # null specifies the use of the latest nagl model
+```
+
+## Overwriting Charges
+
+By default, the `charge-molecules` command will only assign partial charges to ligands which **do not** already have charges, 
+this behaviour can be changed via the `--overwrite-charges` flag which will assign new charges using the specified settings.
diff --git a/rbfe_tutorial/cli_tutorial.md b/rbfe_tutorial/cli_tutorial.md
@@ -50,9 +50,21 @@ we do the following:
 - Instruct `openfe` to output files into a directory called `network_setup`
     with the `-o network_setup` option.
 
-Planning the campaign may take some time, as it tries to find the best
-network from all possible transformations. This will create a directory called
-`network_setup/`, which is structured like this:
+Planning the campaign may take some time due to the complex series of tasks involved:
+
+- partial charges are generated for each of the ligands to ensure reproducibility, by default this requires a semi-empirical quantum
+chemical calculation to calculate `am1bcc` charges
+- atom mappings are created and scored based on the perceived difficulty for all possible ligand pairs 
+- an optimal network is extracted from all possible pairwise transformations which balances edge redundancy and the total difficulty score of the network
+
+The partial charge generation can take advantage of multiprocessing which offers a significant speed-up, you can specify 
+the number of processors available using the `-n` flag:
+
+```bash
+openfe plan-rbfe-network -M tyk2_ligands.sdf -p tyk2_protein.pdb -o network_setup -n 4
+```
+
+This will result in a directory called `network_setup/`, which is structured like this:
 
 <!-- top lines from `tree network_setup` -->
 
@@ -87,7 +99,7 @@ The files that describe each individual simulation we will run are located withi
 leg to run and contains all the necessary information to run that leg.
 Filenames indicate ligand names as taken from the SDF; for example, the file
 `easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex.json` is the leg
-associated with the tranformation of the ligand `lig_ejm_31` into `lig_ejm_42`
+associated with the transformation of the ligand `lig_ejm_31` into `lig_ejm_42`
 while in complex with the protein.
 
 A single RBFE between a pair of ligands requires running two legs of an alchemical cycle (JSON files):
@@ -112,8 +124,9 @@ OpenFE contains many different options and methods for setting up a simulation c
 The options can be easily accessed and modified by providing a settings
 file in the `.yaml` format.
 Let's assume you want to exchange the LOMAP atom mapper with the Kartograf
-atom mapper and the Minimal Spanning Tree
-Network Planner with the Maximal Network Planner, then you could do the following:
+atom mapper, the Minimal Spanning Tree
+Network Planner with the Maximal Network Planner and the am1bcc charge method with the am1bccelf10 version from openeye, 
+then you could do the following:
 
 1. provide a file like `settings.yaml` with the desired changes:
 
@@ -123,6 +136,11 @@ mapper:
 
 network:
   method: generate_maximal_network
+
+partial_charge:
+  method: am1bccelf10
+  settings:
+    off_toolkit_backend: openeye
 ```
 
 2. Plan your rbfe network with an additional `-s` flag for passing the settings:
@@ -148,6 +166,7 @@ Using Options:
         Mapper: <kartograf.atom_mapper.KartografAtomMapper object at 0x7fea079de790>
         Mapping Scorer: <function default_lomap_score at 0x7fea1b423d80>
         Networker: functools.partial(<function generate_maximal_network at 0x7fea18371260>)
+        Partial Charge Generation: am1bccelf10
 ```
 
 That concludes the straightforward process of tailoring your OpenFE setup to your specifications.
@@ -166,6 +185,12 @@ network:
   # method: generate_radial_network
   # method: generate_maximal_network
   # method: generate_minimal_redundant_network
+
+partial_charge:
+  method: am1bcc
+  # method: am1bccelf10
+  # settings:
+    # off_toolkit_backend: openeye  # required for the am1bccelf10 method
 ```
 
 **Customize away!**

diff --git a/rbfe_tutorial/python_tutorial.ipynb b/rbfe_tutorial/python_tutorial.ipynb
@@ -47,6 +47,42 @@
     "ligands = [openfe.SmallMoleculeComponent.from_rdkit(mol) for mol in supp]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "8e5de19a",
+   "metadata": {},
+   "source": [
+    "## Charging the ligands\n",
+    "\n",
+    "It is recommended to use a single set of charges for each ligand to ensure reproducibility between repeats or consistent charges between different legs of a calculation involving the same ligand, like a relative binding affinity calculation for example. \n",
+    "\n",
+    "Here we will use some utility functions from OpenFE which can assign partial charges to a series of molecules with a variety of methods which can be configured via the `OpenFFPartialChargeSettings` class. In this example \n",
+    "we will charge the ligands using the `am1bcc` method from `ambertools` which is the default charge scheme used by OpenFE."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5219106c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openfe.protocols.openmm_utils.omm_settings import OpenFFPartialChargeSettings\n",
+    "from openfe.protocols.openmm_utils.charge_generation import bulk_assign_partial_charges\n",
+    "\n",
+    "charge_settings = OpenFFPartialChargeSettings(partial_charge_method=\"am1bcc\", off_toolkit_backend=\"ambertools\")\n",
+    "\n",
+    "charged_ligands = bulk_assign_partial_charges(\n",
+    "    molecules=ligands,\n",
+    "    overwrite=False,  \n",
+    "    method=charge_settings.partial_charge_method,\n",
+    "    toolkit_backend=charge_settings.off_toolkit_backend,\n",
+    "    generate_n_conformers=charge_settings.number_of_conformers,\n",
+    "    nagl_model=charge_settings.nagl_model,\n",
+    "    processors=1\n",
+    ")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "6963be83",
@@ -93,7 +129,7 @@
    "outputs": [],
    "source": [
     "ligand_network = network_planner(\n",
-    "    ligands=ligands,\n",
+    "    ligands=charged_ligands,\n",
     "    mappers=[mapper],\n",
     "    scorer=scorer\n",
     ")"