Benchmark systems list #52

lilyminium · 2025-03-10T11:03:08Z

Following on from the meeting and notes, here's a list of systems we'd be interested in benchmarking. Please let me know if I can clarify anything. I've tried to be as comprehensive as possible in case any of them require specific tweaks, so some of them (e.g. the charge ones) are very similar to each other and may wind up being the same use case.

I also added some non-essential nice-to-have systems marked with [?] that seem like related examples to me, but lie outside the scope of what has been discussed for pontibus 0.0.1. Please don't treat these as examples that must be addressed, but more me wondering if they are currently feasible in pontibus.

System

solute + solvent (non-water) with custom, unreleased OFFXML
custom OFFXML with library charges on solute
custom OFFXML with library charges on solute and solvent
custom OFFXML with user-supplied charges on solute and solvent
custom OFFXML with NAGL charges on solute/solvent
custom OFFXML with a non-AM1-BCC charge method (e.g. ChargeIncrementHandler -- basically anything that Interchange can currently handle) for solute and solvent
Solvent that is water with virtual sites
Solvent that is a re-fit water model with custom ions
Solvent that is custom water with virtual sites [?]
Solvent that is not water with virtual sites [?]
Solute that has virtual sites [??]
Custom OFFXML with different 1-4 interactions [?]
Custom OFFXML with alternate functional forms supported by smirnoff plugins, e.g. a double exponential vdW term [?]

Settings

with shorter equilibration times (eg 2 ns)
with different production times
with or without HMR and with a 1 fs, 2 fs, 4 fs time step

Retrieval

I realised while writing this up I don't have a great idea of what is retrievable, but the following could be useful:

statistics over the simulation [?] (e.g. density, potential energy, temperature/pressure over the trajectory)
trajectory itself [?]
end-point conformations / PDBs

IAlibay · 2025-03-10T15:55:03Z

I also added some non-essential nice-to-have systems marked with [?] that seem like related examples to me, but lie outside the scope of what has been discussed for pontibus 0.0.1.

Thanks I hadn't seen that before I responded on slack. From a quick scan, some of these are more feasible than other, but for the sake of time / avoiding too much text, I'm going to put all of those as "to be discussed mid-April" if that's ok?

IAlibay · 2025-03-10T19:54:10Z

Apologies in advance for the briefness. What I'll do is try to provide quick answers to everything here, and then hopefully will document in the network generation scripts, but worse case scenario the rough answer will be here.

System

solute + solvent (non-water) with custom, unreleased OFFXML

Possible, will document in mnsol example script.

You can just pass the custom OFFXML string (overly clarifying - I mean this: openforcefield/openff-toolkit#2021) as an input to the settings.solvent_forcefield_settings.forcefields and settings.vacuum_forcefield_settings.forcefields lists.

Note: you should avoid passing in a custom serialized file because it's hard to keep the file around on execution.

custom OFFXML with library charges on solute

This is unfortunately a situation I didn't account for properly. Right now you can fully rely on library charges for the solvent but not the solute.

The way to do this right now would be to load in the library charges into your OpenFF Molecule ahead of time and then pass that to the SmallMoleculeComponent.

custom OFFXML with library charges on solute and solvent

Library charges on the solvent can be done by setting settings.solvation_settings.assign_solvent_charges=False. This will ignore partial charges on the ExtendedSolventComponent and try to fetch them from the library charges. If no library charges are found, the Protocol will fail.

custom OFFXML with user-supplied charges on solute and solvent

That's the easiest case! Just add the charges to a OpenFF Molecule for both your solute and solvent and feed it to a SmallMoleculeComponent. For the solvent, the SmallMoleculeComponent is then fed to the ExtendedSolventComponent. You then need to set settings.solvation_settings.assign_solvent_charges=True to make sure the solvent charges get picked up.

custom OFFXML with NAGL charges on solute/solvent

Here the best would be to assign NAGL charges ahead of time to your Molecule and feed that in (that way you get good provenance). Otherwise you can use NAGL to add partial charges at runtime to uncharged Molecules using settings.partial_charge_settings. Specifically you would need to set partial_charge_settings.partial_charge_method='nagl' and partial_charge_settings.nagl_model to a given value.

custom OFFXML with a non-AM1-BCC charge method (e.g. ChargeIncrement).

I'm not sure. I've only ever encountered / tested ChargeIncrement in the context of a virtual site on a solvent molecule.

At this point in time, I'm not sure. Could you provide more information on the use case maybe?

solvent that is water with virtual sites

This is out of scope on the initial work plan delievered.

Right now the answer is that it probably works (will maybe provide provisional OPC benchmarks later this week), but it would require me spending some time doing some alchemical system testing to check everything is as expected & works fine. Without that, I would strongly advise against fully relying on it in production.

So possibly read it as "if you must run stuff this coming month, but I would urge reserving some of my time to verify the science".

If you really want to try it, you can pass your favourite water model through as an OFFXML and set settings.integrator_settings.reassign_velocities=True.

Solvent that is ref-it water model with custom ions

Maybe I'm missing something here, but I guess just pass this to the OFFXML definition list?

Solvent that is a custom water with virtual sites

Same answer as the virtual site solvent case above. Out of scope, and could be wrong, but if you must it may work.

Solvent that is not water with virtual sites

Would need more work than water with virtual sites. So more out of scope.

Solute that has virtual sites.

Definitely out of scope.

Custom offxml with different 1-4 interactions

Unclear on the details of what you mean here. Probably out of scope.

Custom offxml with alternate functional forms

Definitely out of scope.

Settings

with shorter equilibration times & different production times

There are two equilibrations, the alchemical and non-alchemical ones, and one alchemical production. These are controlled by vacuum_equil_simulation_settings and vacuum_simulation_settings for the vacuum phase, and solvent_equil_simulation_settings and solvent_simulaltion_settings for the solvent phase respectively.

Current defaults are:

# vacuum
## note gas phase has no volume or pressure control so we set NVT equil to None
vacuum_equil_simulation_settings.equilibration_length_nvt=None
vacuum_equil_simulation_settings.equilibration_length=0.2 * unit.nanosecond
vacuum_equil_simulation_settings.production_length=0.5 * unit.nanosecond

vacuum_simulation_settings.equilibration_length=0.5 * unit.nanosecond
vacuum_simulation_settings.production_length=2.0 * unit.nanosecond

# solvent
# non-alchemical equilibration: NVT equil followed by NPT equil followed by NPT prod equil
solvent_equil_simulation_settings.equilibration_length_nvt=0.5 * unit.nanosecond
solvent_equil_simulation_settings.equilibration_length=0.5 * unit.nanosecond
solvent_equil_simulation_settings.production_length=9.5 * unit.nanosecond

solvent_simulation_settings.equilibration_length=1.0 * unit.nanosecond
solvent_simulation_settings.production_length=10.0 * unit.nanosecond

with or without HMR with different timesteps

You can control HMR with settings.solvent_forcefield_settings.hydrogen_mass and settings.vacuum_forcefield_settings.hydrogen_mass.

Current default is set at 3.023841.

You can control the timestep using settings.integrator_settings.timestep current default is 4.0 * unit.timestep.

Retreival

statistics over the simulation (density, potential energy, temperature/pressure).

The data is available somewhere but it's not hooked up to anything right now since it wasn't something that was asked for in the initial request. Adding this to the results objects from the non-alchemical simulation is feasible as a medium-low cost, but something to discuss later.

When running manually / locally you'll find a log file named equil_simulation.log that has the data for the non-alchemical equilibration.

note: file retreival is not possible using alchemiscale.

alchemical statistics

The ProtocolResults object contains various things which you can get, see here: https://pontibus.readthedocs.io/en/latest/api/protocols/generated/pontibus.protocols.solvation.ASFEProtocolResult.html#pontibus.protocols.solvation.ASFEProtocolResult

These include; MBAR overlap, REPEX matrix, forward and reverse convergence, the free energy estimates, the MBAR errors, and the standard deviation.

the trajectory itself

Over alchemiscale it will not be possible to get the trajectory. This is a limitation in alchemiscale itself.

When running locally / via traditional HPC, you will get an XTC for the equilibration and a netcdf for the alchemical production. XTC files are standard and can be read with MDAnalysis. There is tooling in OpenFE for reading the netcdf file (we have a custom MDAnalysis reader etc...), if it's critical, someone in the team can signpost / help. However, given that this wasn't in the approved plan or the initial request, I would prefer it if it was something that would be left for follow-up in April.

It would be possible to add better programmatic access to these via the ProtocolResults objects, that could be something for a v0.0.2.

end-point conformation / PDBs

As above with the trajectory. You can get that from the trajectory or we could add it as an additional artifact, but not currently available in v0.0.1 since it wasn't requested.

lilyminium · 2025-03-10T20:56:16Z

Thanks for the super thorough reply, @IAlibay! Just a quick non-comprehensive response while I digest:

The way to do this right now would be to load in the library charges into your OpenFF Molecule ahead of time and then pass that to the SmallMoleculeComponent.

Great, sounds like as long as we can load in charges onto the OpenFF molecule then that will get respected? Is the way to do that via the Molecule.partial_charges property or is there a keyword we would need?

settings.solvation_settings.assign_solvent_charges=True

Just to make sure I have the correct way around:

True -- charges will get picked up from the Molecule
False -- charges will get re-assigned using the force field (e.g. for librarycharges)?

Could you provide more information on the use case maybe?

This was definitely a stretch case, but one currently supported use-case would be if we re-fit BCCs and assigned charges using the AmberTools AM1 and applied our own BCCs after the fact. This isn't currently a need, I just wondered how much the SMIRNOFF spec would be implemented in case we did want to explore.

partial_charge_settings.nagl_model to a given value

Would this be the same string that's passed into the OpenFF toolkit? More specifically, should this be the name of a published NAGL model in openff-nagl-models where the full file path is resolved by nagl-models?

Solvent that is ref-it water model with custom ions
Maybe I'm missing something here, but I guess just pass this to the OFFXML definition list?

Thanks! No not missing anything, I just wanted to check in case water was still treated specially and needed a released force field.

Custom offxml with different 1-4 interactions
Unclear on the details of what you mean here.

Here I mean if we tuned the strength of 1-4 interactions in the SMIRNOFF force field, specifically setting the 1-4 scaling coefficient from 0.5 (vdW) or 0.83 (electrostatics) to another fine-tuned value. This is definitely not an urgent use case but another question that explores how much the SMIRNOFF spec is implemented (Interchange for example shouldn't have an issue creating an OpenMM system with this), or if there's any hard-coding we need to be keeping an eye on.

IAlibay · 2025-03-10T21:21:58Z

Great, sounds like as long as we can load in charges onto the OpenFF molecule then that will get respected? Is the way to do that via the Molecule.partial_charges property or is there a keyword we would need?

Yeah it's just the standard Molecule.partial_charges property.

The caveat here is that if the partial charges are allmost all equal to zero they will get ignored.
If the property is not assigned it will attempt to auto-assign partial charges (defaults to AmberTools + AM1BCC), unless it's the solvent and settings.solvation_settings.assign_solvent_charges=True.

Just to make sure I have the correct way around:
True -- charges will get picked up from the Molecule
False -- charges will get re-assigned using the force field (e.g. for librarycharges)?

Yeah it's not the most intuitive keyword (would welcome suggestions for changing). If it's true, the partial charges in the OpenFF Molecule will be passed on charge_from_molecule when calling ForceField.create_interchange. Otherwise it will be excluded.

Note that if True and the Molecule does not have partial charges, then partial charges will be assigned at runtime (defaults to AmberTools + AM1BCC). It does NOT fallback to the ToolkitAM1BCC handler (we explicitly disallow that).

Would this be the same string that's passed into the OpenFF toolkit? More specifically, should this be the name of a published NAGL model in openff-nagl-models where the full file path is resolved by nagl-models?

It's exactly the same thing you would pass to validate_nagl_model_path. We use that method directly to get the absolute path.
Note that unpublished NAGL models probably won't work if used over alchemiscale (since we can't ship the files to the compute workers).

Here I mean if we tuned the strength of 1-4 interactions in the SMIRNOFF force field, specifically setting the 1-4 scaling coefficient from 0.5 (vdW) or 0.83 (electrostatics) to another fine-tuned value. This is definitely not an urgent use case but another question that explores how much the SMIRNOFF spec is implemented (Interchange for example shouldn't have an issue creating an OpenMM system with this), or if there's any hard-coding we need to be keeping an eye on.

I'm not sure. I think I have follow-up questions on that one. So the best I can say is it might be possible, would need a lot of testing though. 100% not possible with different mixing rules as-is.

jameseastwood assigned IAlibay Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark systems list #52

Benchmark systems list #52

lilyminium commented Mar 10, 2025

IAlibay commented Mar 10, 2025

IAlibay commented Mar 10, 2025

lilyminium commented Mar 10, 2025

IAlibay commented Mar 10, 2025 •

edited

Loading

Benchmark systems list #52

Benchmark systems list #52

Comments

lilyminium commented Mar 10, 2025

System

Settings

Retrieval

IAlibay commented Mar 10, 2025

IAlibay commented Mar 10, 2025

System

Settings

Retreival

lilyminium commented Mar 10, 2025

IAlibay commented Mar 10, 2025 • edited Loading

IAlibay commented Mar 10, 2025 •

edited

Loading