FieldSchNet provides a deep neural network for modeling the interaction of molecules and external environments as described in [1]. The package builds on the SchNetPack infrastructure [2] and provides functionality for training and deploying FieldSchNet models for simulating molecular spectra and reactions in the presence of fields, continuum solvents, as well as in a QM/MM setup.
- python 3
- torch>=0.4.1
- numpy
- ASE
- Hydra
- schnetpack>=0.3.0
- PyTorch (>=0.4.1)
- Optional: tensorboardX
Note: We recommend using a GPU for training the neural networks.
Clone the repository:
git clone [email protected]:atomistic-machine-learning/field_schnet.git
cd field_schnet
Install requirements:
pip install -r requirements.txt
Install FieldSchNet:
pip install .
Here, we show how to train a basic FieldSchNet model for predicting energies, forces, dipole moments and polarizability tensors using the ethanol molecule as an example. In addition, we demonstrate how a trained model can be used in molecular dynamics simulations to compute infrared and Raman spectra.
All FieldSchNet scripts used in the example are inserted into your PATH during installation.
A reference dataset ethanol_vacuum.db
in ASE db format (see [1] for details on the data) can be found in
the example
directory.
A FieldSchNet model can be trained on this dataset via
field_schnet_run.py data_path=<PATH/TO/>ethanol_vacuum.db basename=<modeldir> cuda=true
where data_path
should point to the reference data set. basename
indicates the model directory and the cuda=true
flag activates GPU training.
The training progress will be logged in <modeldir>/log
, either as CSV (default) or as TensorBoard event files.
A training run using default settings should take approximately five hours on a notebook GPU with 2 GB VRAM.
To evaluate the trained model with the best validation error, call
field_schnet_run.py data_path=<PATH/TO/>ethanol_vacuum.db basename=<modeldir> cuda=true mode=eval
which will run on the test set and write a result file evaluation.txt
into the model directory.
The best model is stored in the file best_model
in the same directory.
Once a model has been trained for ethanol, it can be used to simulate various molecular spectra (a pre-trained example
model can be found under example/ethanol_vacuum.model
).
We run a molecular dynamics (MD) simulation using the md
module of SchNetPack (more details can be found in the
SchNetPack MD tutorial).
A basic input file template md_input.yaml
for using FieldSchNet in conjunction with SchNetPack MD is provided in the
example
directory.
To run a simulation, a few adaptations to this file are necessary:
- The
model_file
entry in thecalculator
block must be changed to a valid path to a trained model (e.g.<modeldir>/best_model
orexample/ethanol_vacuum.model
) - A path to a xyz-file containing an initial ethanol structure must be set in
molecule_file
(system
block). Theexample
directory contains a suitable ethanol structure (ethanol_initial.xyz
) - The
simulation_dir
placeholder should be changed to a reasonable name for the experiment.
The simulation is started with
spk_md.py md_input.yaml
It will generate the directory specified in simulation_dir
and store the results of the simulation there.
MD related data (such as forces, properties and velocities) are stored in an hdf5 file
(<simulation_dir>/simulation.hdf5
).
In general, data can be extracted using the HDF5Loader
utility of SchNetPack (schnetpack.md.utils.hdf5_data
).
For comvenience, we provide the script field_schnet_extract_hdf5.py
which can be used to convert the sampled structures to
XYZ-format
field_schnet_extract_hdf5.py <simulation_dir>/simulation.hdf5 <xyz_directory>
This will generate a trajectory in XYZ-format in the <xyz_directory>
.
Once a simulation has been performed, molecular spectra can be computed from a simulation.hdf5
file with the
field_schnet_spectra_hdf5.py
script.
To compute, store and plot spectra based on the above MD run, execute:
field_schnet_spectra_hdf5.py <simulation_dir>/simulation.hdf5 <spectrum.npz> --spectra ir raman --plot --skip_initial 10000
This will generate infrared and polarized and depolarized Raman spectra and plot them to the screen (--plot
). The spectrum data
will also be stored to the <spectrum.npz>
file. --skip_initial 10000
indicates, that the first 10000 steps (5 ps) of
the trajectory should be seen as equilibration period and be skipped.
Please refer to field_schnet_spectra_hdf5.py --help
for more details.
FieldSchNet uses hydra for managing experiment configs. The default
settings produce a relatively small FieldSchNet model for demonstration purposes. These settings can be modified via
standard hydra syntax using the configurations defined in src/scripts/configs
. The currently used config can also
be printed via
field_schnet_run.py --cfg job
and optionally be stored to a file and modified. Such a configuration file can then be used in an experiment with the command
field_schnet_run.py load_config=<PATH/TO/CONFIG>
which will override all changed default settings.
The properties fit by FieldSchNet are controlled via the tradeoff
block. Properties can be added and removed by
changing the entries. Different pre-defined settings are available and can be changed by adding tradeoffs=<setting>
to
the command line. E.g. changing the training command to
field_schnet_run.py data_path=<PATH/TO/>ethanol_vacuum.db basename=<modeldir> cuda=true tradeoffs=electromagnetic
will also include NMR shielding tensors during model training.
In order to use FieldSchNet in QM/MM simulations, a model first needs to be trained on reference data containing either external charge positions and magnitudes or the corresponding external field acting on each atom. QM/MM training is initialized via
field_schnet_run.py data_path=<PATH/TO/>ethanol_qmmm.db basename=<modeldir> cuda=true model.field_mode=qmmm
The FieldSchNet package provides two scripts qmmm_client.py
and qmmm_server.py
(src/scripts/qmmm
) to perform
QM/MM simulations with the NAMD QM/MM interface (http://www.ks.uiuc.edu/Research/qmmm/).
To use a FieldSchNet model, the QM/MM specification in the NAMD configuration file must be updated to include:
QMPointChargeScheme none
QMBondScheme "cs"
qmSoftware "custom"
qmExecPath "<PATH/TO/>qmmm_client.py" --port <port_number>
where <port_number>
specifies the port used in the socket interface.
To start the simulation, first QM/MM server is initialized which will perform the QM/MM computations:
python <PATH/TO/>qmmm_server.py <port_number> <max_connections>
<port_number>
is the same as used above and <max_connections>
is the maximum number of calculations accepted by the
server (should be the same as the number of steps in the QM/MM simulation). Use of the GPU can be toggled via --cuda
.
Given all other prerequisites have been satisfied, QM/MM can then be performed by running NAMD with the modified config file:
namd2 <namd_config>.conf
A sample setup for QM/MM with ethanol and a FieldSchNet model is provided in examples/qmmm_ethanol
(without pretrained
model).
Sample reference data for training ethanol QM/MM models is provided in the ethanol_qmmm.db
database as a tar
archive
(examples
directory, see [1] for details on how the data was generated).
-
[1] M. Gastegger, K.T. Schütt, K.-R. Müller. Machine learning of solvent effects on molecular spectra and reactions (2020) https://arxiv.org/abs/2010.14942
-
[2] K.T. Schütt, P. Kessel, M. Gastegger, K. Nicoli, A. Tkatchenko, K.-R. Müller. SchNetPack: A Deep Learning Toolbox For Atomistic Systems. J. Chem. Theory Comput, 15(1), 448–455 (2018) 10.1021/acs.jctc.8b00908 arXiv:1809.01072