Supplementary material for "Discrete and mixed-variable experimental design with surrogate-based approach"
Full paper link: Link
Experimental design aims to efficiently collect informative data and derive meaningful conclusions while operating within resource constraints. We propose the use of a different framework with mixed-integer surrogates and acquisition functions, where we adopt PWAS (Piecewise Affine Surrogate-based optimization), which is designed to address the challenges posed by mixed-variable problems subject to linear constraints. PWAS enables the direct incorporation of discrete and mixed-variable decision variables, facilitating a more realistic representation of real-world problems. Moreover, PWAS accommodates linear equality and inequality constraints commonly encountered in physical systems, ensuring feasible solutions are proposed.
We demonstrate the effectiveness of PWAS in optimizing experimental designs through three case studies, each with a different size of design space and numerical complexity:
- Optimization of reaction conditions for Suzuki–Miyaura cross-coupling (fully categorical)
- Optimization of crossed-barrel design to augment mechanical toughness (mixed-integer)
- Solvent design for enhanced Menschutkin reaction rate (mixed-integer and categorical with linear constraints)
By comparing with conventional optimization algorithms, we offer insights into the practical applicability of PWAS.
We refer readers to the paper for detailed discussions.
The case studies and relevant files needed to reproduce the results in the paper are available.
To obtain a local copy of the repository:
git clone https://github.com/MolChemML/ExpDesign.git
To run the case studies, the following steps need to be followed:
First, install the PWAS package.
- 🔴IMPORTANT: there are external dependencies of PWAS. See the package repository for the detailed installation instructions for the MILP solvers used by PWAS. You can either obtain a free academic license (if applicable) for
GUROBI
or download the freeGLPK
package or interface other MILP solvers following the instruction noted in PWAS. We usedGUROBI
for our case studies. - See an overview of PWAS at this section
Second, to run the case studies and the relevant analysis to generate figures, you need to include the following additional packages to load the dataset and export the results:
- pandas >= 2.1.0
- openpyxl >= 3.1.2
- seaborn >= 0.13.2
- matplotlib >= 3.8.3
- sklearn >= 1.3.0
Other notes:
- For the information on each case study, please see the relevant folder noted at Case studies
- We used the Olympus package to run comparison studies. Please see the forked version for the relevant updates required to run the case studies.
- 🔴IMPORTANT: Olympus require tensorflow==1.15, therefore, python version < 3.8 is required
- The
yml
file used by the authors to run the comparison studies is included:olympus_pwas_comp.yml
The package is available in the repository, which can be installed via the following:
pip install pwasopt
The flowchart of the solver is shown here:
where PARC is the package used to fit the surrogate, whose flowchart is shown below:
For Suzuki coupling and crossed barrel case studies, we compare the performances of PWAS with the algorithms implemented in the following packages:
- Genetic
- Hyperopt (tpe)
- Botorch (BO with GP)
- EDBO (BO with GP trained on reaction optimization data) Additionally, we also consider Random Search as a baseline
We note that Random Search, Genetic, Hyperopt, and BoTorch have been interfaced in the Olympus package; therefore, we use the algorithmic structure implemented in the package for benchmark tests with their default solver parameters. A customized forked version tailored for our testing is also available on GitHub at Branch “pwas_comp“, which you can see all the modifications. Note, some modifications are only necessary for Windows systems.
The tests were repeated 30 times. Within each run, the maximum iteration was set to 50, with 10 initial samples.
As for the solvent design case study, due to the relatively large number of constraints involved, comparisons with the aforementioned solvers are impractical, instead, it is compared with a recently proposed DoE-QM-CAMD method.
-
Design space: fully categorical
-
Optimization goal: to identify optimial combinatorial sets of precursors that can maximize the yield of the desired product
-
Parameters to optimize: aryl halide (X), boronic acid derivative (Y), base, ligand, and solvent
-
Notes on the code:
- Relevant folder:
suzuki_edbo
- The files needed to run each optimization method are included:
- The results and the files used to generate figures are available at
z_results
- Relevant folder:
-
Results:
-
Design space: mixed-integer
-
Optimization goal: to identify optimial combinatorial sets of structure parameters that can maximize the toughness of the resulting crossed-barrel strucure while not exceeding a specified force threshold
-
Parameters to optimize:
- number of hollow columns (
$n$ ), twist angle of the columns ($\theta$ ), outer radius of the columns ($r$ ), and thickness of the hollow columns ($t$ )
- number of hollow columns (
-
Notes on the code:
- Relevant folder:
crossed_barrel
- The files needed to run each solver are included.
- The results and the files used to generate figures are available at
z_results
- Relevant folder:
-
Results:
-
Design space: mixed-integer and categorical
-
Optimization goal: to identify optimial solvent compositions to enhance the reaction rate of the Menschutkin reaction of phenacyl bromide and pyridine
-
Variables to optimize:
- 46 integer variables indicating the number of each atom group present in the designed solvent
- 1 auxiliary categorical variable to delineate the solvent's structure (acrylic, monocyclic, bicyclic)
- 7 auxiliary binary variables for structure-related constraints
- Along with 115 linear inequality constraints and 5 linear equality constraints to enforce structure-property, chemical feasibility- and complexity-related solution features.
- For instance, constraints are used to ensure the octet rule, to specify the minimum of the octanol/water partition coefficient, and other relevant properties.
- See the detailed list in Gui et al, 2023
- Also formatted in this Excel file
-
Notes on the code:
- Relevant folder:
solvent design case study
main.py
: run PWAS to solve the case studygc_lnkCal.py
: calculate the ln(K) data from group contributionqm_simulator.py
: return the ln(k) value given the structure of the solvent, ln(k) value is obtained from quantum-mechanical (QM) calculationssolvent_list_matrix.xlsx
: Excel file including the full feasible design space, bounds and constraints on the optimization variables, group contribution values- This file is updated based on Gui et al, 2023
- The results and the files used to generate figures are available at
z_results
- Relevant folder:
-
Results
Solvent properties of the initial samples (left), the first 10 active-learning samples (middle), and the last 10 active-learning samples (right):
Bubble chart of chemical properties of the solvents:
Mengjia Zhu, Austin Mroz, Lingfeng Gui, Kim Jelfs, Alberto Bemporad, Ehecatl Antonio del Río Chanona, and Ye Seol Lee
This repository is distributed without any warranty. Please cite the paper below if you use it.
@article{ExpDesign2024,
author ="Zhu, Mengjia and Mroz, Austin and Gui, Lingfeng and Jelfs, Kim E. and Bemporad, Alberto and del Río Chanona, Ehecatl Antonio and Lee, Ye Seol",
title ="Discrete and mixed-variable experimental design with surrogate-based approach",
journal ="Digital Discovery",
year ="2024",
volume ="3",
issue ="12",
pages ="2589-2606",
publisher ="RSC",
doi ="10.1039/D4DD00113C",
url ="http://dx.doi.org/10.1039/D4DD00113C",
}
MIT