benchopt · MatDag · Feb 19, 2024 · Feb 19, 2024 · Feb 19, 2024 · Feb 19, 2024
diff --git a/README.rst b/README.rst
@@ -4,20 +4,40 @@ Bilevel Optimization Benchmark
 
 *Results can be consulted on https://benchopt.github.io/results/benchmark_bilevel.html*
 
-BenchOpt is a package to simplify and make more transparent and
+BenchOpt is a package to simplify, and make more transparent, and
 reproducible the comparisons of optimization algorithms.
 This benchmark is dedicated to solvers for bilevel optimization:
 
 $$\\min_{x} f(x, z^*(x)) \\quad \\text{with} \\quad z^*(x) = \\arg\\min_z g(x, z), $$
 
-where $g$ and $f$ are two functions of two variables.
+where $g$, and $f$ are two functions of two variables.
 
 Different problems
 ------------------
 
-This benchmark currently implements two bilevel optimization problems: regularization selection, and hyper data cleaning.
+This benchmark currently implements three bilevel optimization problems: quadratic problem, regularization selection, and hyper data cleaning.
 
-1 - Regularization selection
+1 - Simulated bilevel problem
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In this problem, the inner, and the outer functions are quadritics functions defined of $\\mathbb{R}^{d\\times p}$
+
+$$g(x, z) = \\frac{1}{n}\\sum_{i=1}^n \\frac{1}{2} z^\\top H_i^z z + \\frac{1}{2} x^\\top H_i^x x + x^\\top C_i z + c_i^\\top z + d_i^\\top x$$
+
+and
+
+$$f(x, z) = \\frac{1}{m} \\sum_{j=1}^m \\frac{1}{2} z^\\top \\tilde H_j^z z + \\frac{1}{2} x^\\top \\tilde H_j^x x + x^\\top \\tilde C_j z + \\tilde c_j^\\top z + \\tilde d_j^\\top x$$
+
+where $H_i^z, \\tilde H_j^z$ are symmetric positive definite matrices of size $p\\times p$, $H_j^x, \\tilde H_j^x$ are symmetric positive definite matrices of size $d\\times d$, $C_i, \\tilde C_j$ are matrices of size $d\\times p$, $c_i$, $\\tilde c_j$ are vectors of size $d$, and $d_i, \\tilde d_j$ are vectors of size $p$.
+
+The matrices $H_i^z, H_i^x, \\tilde H_j^z, \\tilde H_j^x$ are generated randomly such that the eigenvalues of $\\frac1n\\sum_i H_i^z$ are between ``mu_inner``, and ``L_inner_inner``, the eigenvalues of $\\frac1n\\sum_i H_i^x$ are between ``mu_inner``, and ``L_inner_outer``, the eigenvalues of $\\frac1m\\sum_j \\tilde H_j^z$ are between ``mu_inner``, and ``L_outer_inner``, and the eigenvalues of $\\frac1m\\sum_j \\tilde H_j^x$ are between ``mu_inner``, and ``L_outer_outer``.
+
+The matrices $C_i, \\tilde C_j$ are generated randomly such that the spectral norm of $\\frac1n\\sum_i C_i$ is lower than ``L_cross_inner``, and the spectral norm of $\\frac1m\\sum_j \\tilde C_j$ is lower than ``L_cross_outer``.
+
+Note that in this setting, the solution of the inner problem is a linear system. Moreover, the full batch inner and outer functions can be cheaply computed by storing the average of the Hessian matrices. Thus, the value function can be cheaply evaluated in closed form in medium dimension.
+
+
+2 - Regularization selection
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 In this problem, the inner function $g$ is defined by 
@@ -41,7 +61,7 @@ Covtype
 
 *Homepage : https://archive.ics.uci.edu/dataset/31/covertype*
 
-This is a logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with  $a_i\\in\\mathbb{R}^p$ are the features and $y_i=\\pm1$ is the binary target.
+This is a logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with  $a_i\\in\\mathbb{R}^p$ are the features, and $y_i=\\pm1$ is the binary target.
 For this problem, the loss is $\\ell(d_i, z) = \\log(1+\\exp(-y_i a_i^T z))$, and the regularization is simply given by
 $$\\mathcal{R}(x, z) = \\frac12\\sum_{j=1}^p\\exp(x_j)z_j^2,$$
 each coefficient in $z$ is independently regularized with the strength $\\exp(x_j)$.
@@ -51,18 +71,18 @@ Ijcnn1
 
 *Homepage : https://www.openml.org/search?type=data&sort=runs&id=1575&status=active*
 
-This is a multicalss logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with  $a_i\\in\\mathbb{R}^p$ are the features and $y_i\\in \\{1,\\dots, k\\}$ is the integer target, with k the number of classes.
+This is a multicalss logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with  $a_i\\in\\mathbb{R}^p$ are the features, and $y_i\\in \\{1,\\dots, k\\}$ is the integer target, with k the number of classes.
 For this problem, the loss is $\\ell(d_i, z) = \\text{CrossEntropy}(za_i, y_i)$ where $z$ is now a k x p matrix. The regularization is given by 
 $$\\mathcal{R}(x, z) = \\frac12\\sum_{j=1}^k\\exp(x_j)\\|z_j\\|^2,$$
 each line in $z$ is independently regularized with the strength $\\exp(x_j)$.
 
 
-2 - Hyper data cleaning
+3 - Hyper data cleaning
 ^^^^^^^^^^^^^^^^^^^^^^^
 
 This problem was first introduced by [Fra2017]_ .
 In this problem, the data is the MNIST dataset.
-The training set has been corrupted: with a probability $p$, the label of the image $y\\in\\{1,\\dots,10\\}$ is replaced by another random label between 1 and 10.
+The training set has been corrupted: with a probability $p$, the label of the image $y\\in\\{1,\\dots,10\\}$ is replaced by another random label between 1, and 10.
 We do not know beforehand which data has been corrupted.
 We have a clean testing set, which has not been corrupted.
 The goal is to fit a model on the corrupted training data that has good performances on the test set.
@@ -91,7 +111,7 @@ This benchmark can be run using the following commands:
    $ git clone https://github.com/benchopt/benchmark_bilevel
    $ benchopt run benchmark_bilevel
 
-Apart from the problem, options can be passed to `benchopt run`, to restrict the benchmarks to some solvers or datasets, e.g.:
+Apart from the problem, options can be passed to ``benchopt run``, to restrict the benchmarks to some solvers or datasets, e.g.:
 
 .. code-block::
 
@@ -103,10 +123,19 @@ You can also use config files to setup the benchmark run:
 
    $ benchopt run benchmark_bilevel --config config/X.yml
 
-where `X.yml` is a config file. See https://benchopt.github.io/index.html#run-a-benchmark for an example of a config file. This will possibly launch a huge grid search. When available, you can rather use the file `X_best_params.yml` in order to launch an experiment with a single set of parameters for each solver.
+where ``X.yml`` is a config file. See https://benchopt.github.io/index.html#run-a-benchmark for an example of a config file. This will possibly launch a huge grid search. When available, you can rather use the file ``X_best_params.yml`` in order to launch an experiment with a single set of parameters for each solver.
 
-Use `benchopt run -h` for more details about these options, or visit https://benchopt.github.io/api.html.
+Use ``benchopt run -h`` for more details about these options, or visit https://benchopt.github.io/api.html.
 
+How to contribute to the benchmark?
+-----------------------------------
+
+If you think that a solver is missing, or if you want to add a new problem, feel free to open a pull request or an issue!
+
+1 - How to add a new solvers?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+* Stochastic solver: see the detailed explanations in the [AmIGO solver](solvers/amigo.py).
+* Other solver: see the detailed explanation in the [Benchopt documentation](https://benchopt.github.io/tutorials/add_solver.html).
 
 Cite
 ----

diff --git a/benchmark_utils/stochastic_jax_solver.py b/benchmark_utils/stochastic_jax_solver.py
@@ -98,10 +98,28 @@ def set_objective(self, f_inner, f_outer, n_inner_samples, n_outer_samples,
 
         inner_var0, outer_var0: array-like, shape (dim_inner,) (dim_outer,)
 
-        f_inner_fb, f_outer_fb: callable
-            Full batch version of f_inner and f_outer. Should take as input:
-                * inner_var: array-like, shape (dim_inner,)
-                * outer_var: array-like, shape (dim_outer,)
+        Attributes
+        ----------
+        f_inner, f_outer: callable
+            Inner and outer objective function for the bilevel optimization
+            problem.
+
+        n_inner_samples, n_outer_samples: int
+            Number of samples to draw for the inner and outer objective
+            functions.
+
+        inner_var0, outer_var0: array-like, shape (dim_inner,) (dim_outer,)
+
+        batch_size_inner, batch_size_outer: int
+            Size of the minibatch to use for the inner and outer objective
+            functions.
+
+        state_inner_sampler, state_outer_sampler: dict
+            State of the minibatch samplers for the inner and outer objectives.
+
+        one_epoch: callable
+            Jitted function that runs the solver for one epoch. One epoch is
+            defined as `eval_freq` iterations of the solver.
         """
 
         self.f_inner = f_inner

diff --git a/config/quadratics_021424_best_params.yml b/config/quadratics_021424_best_params.yml
@@ -3,10 +3,10 @@ objective:
 dataset:
   - quadratic[L_cross_inner=0.1,L_cross_outer=0.1,mu_inner=[.1],n_samples_inner=[32768],n_samples_outer=[1024],dim_inner=100,dim_outer=10]
 solver:
-  - AmIGO[batch_size=64,eval_freq=16,framework=none,n_inner_steps=10,outer_ratio=1.0,step_size=0.01,random_state=[1,2,3,4,5,6,7,8,9,10]] 
+  - AmIGO[batch_size=64,eval_freq=16,framework=none,n_inner_steps=10,outer_ratio=0.1,step_size=0.01,random_state=[1,2,3,4,5,6,7,8,9,10]]
   - MRBO[batch_size=64,eta=0.5,eval_freq=16,framework=none,n_shia_steps=10,outer_ratio=0.1,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
   - SABA[batch_size=64,eval_freq=64,framework=none,mode_init_memory=zero,outer_ratio=1.0,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
-  - SRBA[batch_size=64,eval_freq=64,framework=none,outer_ratio=0.1,period_frac=0.5,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
+  - SRBA[batch_size=64,eval_freq=64,framework=none,outer_ratio=1.0,period_frac=0.5,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
   - StocBiO[batch_size=64,eval_freq=16,framework=none,n_inner_steps=10,n_shia_steps=10,outer_ratio=1.0,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
   - VRBO[batch_size=64,eval_freq=2,framework=none,n_inner_steps=10,n_shia_steps=10,outer_ratio=1.0,period_frac=0.01,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
   - F2SA[batch_size=64,delta_lmbda=0.01,eval_freq=16,framework=none,lmbda0=1,n_inner_steps=10,outer_ratio=1.0,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]

diff --git a/figures/plot_quadratics.py b/figures/plot_quadratics.py
@@ -0,0 +1,201 @@
+from pathlib import Path
+
+import numpy as np
+import pandas as pd
+import matplotlib as mpl
+import matplotlib.pyplot as plt
+
+mpl.rc('text', usetex=True)
+
+FILE_NAME = Path(__file__).with_suffix('')
+METRIC = 'objective_value'
+
+# DEFAULT_WIDTH = 3.25
+DEFAULT_WIDTH = 3
+DEFAULT_HEIGHT = 2
+LEGEND_RATIO = 0.1
+
+N_POINTS = 500
+X_LIM = 250
+
+# Utils to get common STYLES object and setup matplotlib
+# for all plots
+
+mpl.rcParams.update({
+    'font.size': 10,
+    'legend.fontsize': 'small',
+    'axes.labelsize': 'small',
+    'xtick.labelsize': 'small',
+    'ytick.labelsize': 'small'
+})
+
+STYLES = {
+    '*': dict(lw=1.5),
+
+    'amigo': dict(color='#5778a4', label=r'AmIGO'),
+    'mrbo': dict(color='#e49444', label=r'MRBO'),
+    'vrbo': dict(color='#e7ca60', label=r'VRBO'),
+    'saba': dict(color='#d1615d', label=r'SABA'),
+    'stocbio': dict(color='#85b6b2', label=r'StocBiO'),
+    'srba': dict(color='#6a9f58', label=r'\textbf{SRBA}', lw=2),
+    'f2sa': dict(color='#bcbd22', label=r'F2SA'),
+}
+
+
+def get_param(name, param='period_frac'):
+    params = {}
+    for vals in name.split("[", maxsplit=1)[1][:-1].split(","):
+        k, v = vals.split("=")
+        if v.replace(".", "").isnumeric():
+            params[k] = float(v)
+        else:
+            params[k] = v
+    return params[param]
+
+
+def drop_param(name, param='period_frac'):
+    new_name = name.split("[", maxsplit=1)[0] + '['
+    for vals in name.split("[", maxsplit=1)[1][:-1].split(","):
+        k, v = vals.split("=")
+        if k != param:
+            new_name += f'{k}={v},'
+    return new_name[:-1] + ']'
+
+
+if __name__ == "__main__":
+    fname = "quadratic.parquet"
+    fname = FILE_NAME.parent / fname
+
+    if Path(f'{fname.stem}_stable.parquet').is_file():
+        df = pd.read_parquet(f'{fname.stem}_stable.parquet')
+        print(f'{fname.stem}_stable.parquet')
+    else:
+        df = pd.read_parquet(fname)
+        print(fname)
+
+        # normalize names
+        df['solver'] = df['solver_name'].apply(
+            lambda x: x.split('[')[0].lower()
+        )
+        df['seed_solver'] = df['solver_name'].apply(
+            lambda x: get_param(x, 'random_state')
+        )
+        df['seed_data'] = df['data_name'].apply(
+            lambda x: get_param(x, 'random_state')
+        )
+
+        df['solver_name'] = df['solver_name'].apply(
+            lambda x: drop_param(x, 'random_state')
+        )
+        df['data_name'] = df['data_name'].apply(
+            lambda x: drop_param(x, 'random_state')
+        )
+        df['cond'] = df['data_name'].apply(
+            lambda x: get_param(x, 'L_inner_inner')/get_param(x, 'mu_inner')
+        )
+        df['n_inner'] = df['data_name'].apply(
+            lambda x: get_param(x, 'n_samples_inner')
+        )
+        df['n_outer'] = df['data_name'].apply(
+            lambda x: get_param(x, 'n_samples_outer')
+        )
+        df['n_tot'] = df['n_inner'] + df['n_outer']
+
+        # keep only runs all the random seeds
+        df['full'] = False
+        n_seeds = df.groupby('solver_name')['seed_data'].nunique()
+        n_seeds *= df.groupby('solver_name')['seed_solver'].nunique()
+        for s in n_seeds.index:
+            if n_seeds[s] == 10:
+                df.loc[df['solver_name'] == s, 'full'] = True
+        df = df.query('full == True')
+        df.to_parquet(f'{fname.stem}_stable.parquet')
+
+    fig = plt.figure(
+        figsize=(DEFAULT_WIDTH, DEFAULT_HEIGHT * (1 + LEGEND_RATIO))
+    )
+
+    gs = plt.GridSpec(
+        len(df['n_tot'].unique()), len(df['cond'].unique()),
+        height_ratios=[1] * len(df['n_tot'].unique()),
+        width_ratios=[1] * len(df['cond'].unique()),
+        hspace=0.5, wspace=0.3
+    )
+
+    lines = []
+    for i, n_tot in enumerate(df['n_tot'].unique()):
+        for j, cond in enumerate(df['cond'].unique()):
+            df_pb = df.query("cond == @cond & n_tot == @n_tot")
+            print(f"Cond: {cond}, n: {df_pb['n_inner'].iloc[0]}, "
+                  + f"m: {df_pb['n_outer'].iloc[0]}")
+            to_plot = (
+                df.query("cond == @cond & n_tot == @n_tot & stop_val <= 100")
+                .groupby(['solver', 'solver_name', 'data_name', 'stop_val'])
+                .median(METRIC)
+                .reset_index().sort_values(METRIC)
+                .groupby('solver').first()[['solver_name']]
+            )
+            (
+                df.query("solver_name in @to_plot.values.ravel()")
+                .to_parquet(f'{fname.stem}_best_params.parquet')
+            )
+            print("Chosen parameters:")
+            for s in to_plot['solver_name']:
+                print(f"- {s}")
+            ax = fig.add_subplot(gs[i, j])
+            for solver_name in to_plot['solver_name']:
+                df_solver = df_pb.query("solver_name == @solver_name")
+                solver = df_solver['solver'].iloc[0]
+                style = STYLES['*'].copy()
+                style.update(STYLES[solver])
+                curves = [data[['time', METRIC]].values
+                          for _, data in df_solver.groupby(['seed_data',
+                                                            'seed_solver'])]
+                vals = [c[:, 1] for c in curves]
+                times = [c[:, 0] for c in curves]
+                tmin = np.min([np.min(t) for t in times])
+                tmax = np.max([np.max(t) for t in times])
+                time_grid = np.linspace(np.log(tmin), np.log(tmax + 1),
+                                        N_POINTS)
+                interp_vals = np.zeros((len(times), N_POINTS))
+                for k, (t, val) in enumerate(zip(times, vals)):
+                    interp_vals[k] = np.exp(np.interp(time_grid, np.log(t),
+                                            np.log(val)))
+                time_grid = np.exp(time_grid)
+                medval = np.quantile(interp_vals, .5, axis=0)
+                q1 = np.quantile(interp_vals, .2, axis=0)
+                q2 = np.quantile(interp_vals, .8, axis=0)
+                if i == 0 and j == 0:
+                    lines.append(ax.semilogy(
+                        time_grid, np.sqrt(medval),
+                        **style
+                    )[0])
+                else:
+                    ax.semilogy(
+                        time_grid, np.sqrt(medval),
+                        **style
+                    )
+                ax.fill_between(
+                    time_grid,
+                    np.sqrt(q1),
+                    np.sqrt(q2),
+                    color=style['color'], alpha=0.3
+                )
+                ax.set_xlabel('Time (s)')
+                ax.set_ylabel(r'$\|\nabla h(x^t)\|$')
+                print(f"Min score ({solver}):", df_solver[METRIC].min())
+            ax.grid()
+            ax.set_xlim([0, X_LIM])
+
+            if i == 0 and j == 0:
+                ax_legend = ax.legend(
+                    handles=lines,
+                    ncol=2,
+                    prop={'size': 6.5}
+                )
+    print(f"Saving {fname.with_suffix('.pdf')}")
+    fig.savefig(
+        fname.with_suffix('.pdf'),
+        bbox_inches='tight',
+        bbox_extra_artists=[ax_legend]
+    )