Skip to content

Supplementary material for "Discrete and mixed-variable experimental design with surrogate-based approach"

License

Notifications You must be signed in to change notification settings

MolChemML/ExpDesign

Repository files navigation

Contents

Description

Full paper link: Link

Experimental design aims to efficiently collect informative data and derive meaningful conclusions while operating within resource constraints. We propose the use of a different framework with mixed-integer surrogates and acquisition functions, where we adopt PWAS (Piecewise Affine Surrogate-based optimization), which is designed to address the challenges posed by mixed-variable problems subject to linear constraints. PWAS enables the direct incorporation of discrete and mixed-variable decision variables, facilitating a more realistic representation of real-world problems. Moreover, PWAS accommodates linear equality and inequality constraints commonly encountered in physical systems, ensuring feasible solutions are proposed.

We demonstrate the effectiveness of PWAS in optimizing experimental designs through three case studies, each with a different size of design space and numerical complexity:

  • Optimization of reaction conditions for Suzuki–Miyaura cross-coupling (fully categorical)
  • Optimization of crossed-barrel design to augment mechanical toughness (mixed-integer)
  • Solvent design for enhanced Menschutkin reaction rate (mixed-integer and categorical with linear constraints)

By comparing with conventional optimization algorithms, we offer insights into the practical applicability of PWAS.

We refer readers to the paper for detailed discussions.

How to use this repository

The case studies and relevant files needed to reproduce the results in the paper are available.

To obtain a local copy of the repository:

git clone https://github.com/MolChemML/ExpDesign.git

To run the case studies, the following steps need to be followed:

First, install the PWAS package.

  • 🔴IMPORTANT: there are external dependencies of PWAS. See the package repository for the detailed installation instructions for the MILP solvers used by PWAS. You can either obtain a free academic license (if applicable) for GUROBI or download the free GLPK package or interface other MILP solvers following the instruction noted in PWAS. We used GUROBI for our case studies.
  • See an overview of PWAS at this section

Second, to run the case studies and the relevant analysis to generate figures, you need to include the following additional packages to load the dataset and export the results:

Other notes:

  • For the information on each case study, please see the relevant folder noted at Case studies
  • We used the Olympus package to run comparison studies. Please see the forked version for the relevant updates required to run the case studies.
    • 🔴IMPORTANT: Olympus require tensorflow==1.15, therefore, python version < 3.8 is required
    • The yml file used by the authors to run the comparison studies is included: olympus_pwas_comp.yml

PWAS

The package is available in the repository, which can be installed via the following:

pip install pwasopt

The flowchart of the solver is shown here:

drawing

where PARC is the package used to fit the surrogate, whose flowchart is shown below:

drawing

Case studies

For Suzuki coupling and crossed barrel case studies, we compare the performances of PWAS with the algorithms implemented in the following packages:

  • Genetic
  • Hyperopt (tpe)
  • Botorch (BO with GP)
  • EDBO (BO with GP trained on reaction optimization data) Additionally, we also consider Random Search as a baseline

We note that Random Search, Genetic, Hyperopt, and BoTorch have been interfaced in the Olympus package; therefore, we use the algorithmic structure implemented in the package for benchmark tests with their default solver parameters. A customized forked version tailored for our testing is also available on GitHub at Branch “pwas_comp“, which you can see all the modifications. Note, some modifications are only necessary for Windows systems.

The tests were repeated 30 times. Within each run, the maximum iteration was set to 50, with 10 initial samples.

As for the solvent design case study, due to the relatively large number of constraints involved, comparisons with the aforementioned solvers are impractical, instead, it is compared with a recently proposed DoE-QM-CAMD method.

Suzuki–Miyaura cross-coupling

drawing

  • Design space: fully categorical

  • Optimization goal: to identify optimial combinatorial sets of precursors that can maximize the yield of the desired product

  • Parameters to optimize: aryl halide (X), boronic acid derivative (Y), base, ligand, and solvent

  • Notes on the code:

    • Relevant folder: suzuki_edbo
    • The files needed to run each optimization method are included:
      • run_xx.py: run xx opt. method to solve the case study
      • for Random Search, Genetic, Hyperopt, and BoTorch, run_xx.py files are based on the files included in the Olympus package
      • for EDBO, run_edbo.py is based on the file included in the EDBO package
    • The results and the files used to generate figures are available at z_results
  • Results:

drawing   drawing   drawing

Crossed barrel

drawing

  • Design space: mixed-integer

  • Optimization goal: to identify optimial combinatorial sets of structure parameters that can maximize the toughness of the resulting crossed-barrel strucure while not exceeding a specified force threshold

  • Parameters to optimize:

    • number of hollow columns ($n$), twist angle of the columns ($\theta$), outer radius of the columns ($r$), and thickness of the hollow columns ($t$)
  • Notes on the code:

    • Relevant folder: crossed_barrel
    • The files needed to run each solver are included.
      • corssed_barrel_othersolvers.py: run Random Search, Genetic, Hyperopt, and BoTorch to solve the case study. This file is based on the file included in the Olympus package
      • for EDBO, crossedBarrel_ebdo.py is based on the file included in the EDBO package
    • The results and the files used to generate figures are available at z_results
  • Results:

drawing

Solvent design

drawing

  • Design space: mixed-integer and categorical

  • Optimization goal: to identify optimial solvent compositions to enhance the reaction rate of the Menschutkin reaction of phenacyl bromide and pyridine

  • Variables to optimize:

    • 46 integer variables indicating the number of each atom group present in the designed solvent
    • 1 auxiliary categorical variable to delineate the solvent's structure (acrylic, monocyclic, bicyclic)
    • 7 auxiliary binary variables for structure-related constraints
    • Along with 115 linear inequality constraints and 5 linear equality constraints to enforce structure-property, chemical feasibility- and complexity-related solution features.
      • For instance, constraints are used to ensure the octet rule, to specify the minimum of the octanol/water partition coefficient, and other relevant properties.
      • See the detailed list in Gui et al, 2023
      • Also formatted in this Excel file
  • Notes on the code:

    • Relevant folder: solvent design case study
    • main.py: run PWAS to solve the case study
    • gc_lnkCal.py: calculate the ln(K) data from group contribution
    • qm_simulator.py: return the ln(k) value given the structure of the solvent, ln(k) value is obtained from quantum-mechanical (QM) calculations
    • solvent_list_matrix.xlsx: Excel file including the full feasible design space, bounds and constraints on the optimization variables, group contribution values
    • The results and the files used to generate figures are available at z_results
  • Results

drawing   drawing   drawing

Solvent properties of the initial samples (left), the first 10 active-learning samples (middle), and the last 10 active-learning samples (right): $n^2$: refractive index at 298K, $B$: Abraham’s overall hydrogen-bond basicity, $\epsilon$: dielectric constant at 298K.

drawing

Bubble chart of chemical properties of the solvents: $n^2$: refractive index at 298K, $\epsilon$: dielectric constant at 298K. Abraham’s overall hydrogen-bond basicity is represented by the size of each bubble, with the relevant bubble size scale shown in the legend.

Authors

Mengjia Zhu, Austin Mroz, Lingfeng Gui, Kim Jelfs, Alberto Bemporad, Ehecatl Antonio del Río Chanona, and Ye Seol Lee

This repository is distributed without any warranty. Please cite the paper below if you use it.

Citing the material

@article{ExpDesign2024,
author ="Zhu, Mengjia and Mroz, Austin and Gui, Lingfeng and Jelfs, Kim E. and Bemporad, Alberto and del Río Chanona, Ehecatl Antonio and Lee, Ye Seol",
title  ="Discrete and mixed-variable experimental design with surrogate-based approach",
journal  ="Digital Discovery",
year  ="2024",
volume  ="3",
issue  ="12",
pages  ="2589-2606",
publisher  ="RSC",
doi  ="10.1039/D4DD00113C",
url  ="http://dx.doi.org/10.1039/D4DD00113C",
}

License

MIT

About

Supplementary material for "Discrete and mixed-variable experimental design with surrogate-based approach"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published