Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTN update readme #40

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
8f70bb1
ENH plot_quadratic and update config file
MatDag Feb 19, 2024
61806e0
ENH add quadratics to readme
MatDag Feb 19, 2024
0ecc679
FIX latex readme
MatDag Feb 19, 2024
2307ccc
FIX latex readme
MatDag Feb 19, 2024
391f6cb
Merge branch 'main' into update_readme
MatDag Oct 11, 2024
f394a6d
FIX typo
MatDag Oct 11, 2024
8da9ac5
FIX typo
MatDag Oct 11, 2024
047c83e
FIX double backslash
MatDag Oct 11, 2024
0d8b556
FIX typo
MatDag Oct 11, 2024
17e2267
ENH doc eigenvalues of matrices
MatDag Oct 11, 2024
89f0966
FIX typo
MatDag Oct 11, 2024
479e1d5
ENH value function evaluation nit that expensive
MatDag Oct 11, 2024
76ac1fe
FIX double backquotes
MatDag Oct 11, 2024
8dc98c2
WIP doc
MatDag Oct 14, 2024
df74427
WIP complete doc how to create a solver
MatDag Oct 14, 2024
dd3380c
ENH add comments amigo
MatDag Oct 14, 2024
beb83f1
ENH readme
MatDag Oct 14, 2024
c77ba62
ENH docstring StochasticJaxSolver
MatDag Oct 14, 2024
05dfa21
ENH comment amigo
MatDag Oct 14, 2024
38f2197
FIX flake8
MatDag Oct 14, 2024
d4b1b35
FIX review suggestions README.rst
MatDag Oct 16, 2024
d093b8b
CLN create template_stochastic_solver and moove explanation from AmIGO
MatDag Oct 16, 2024
e0d785c
ENH add template_solver.py
MatDag Oct 17, 2024
f333a7c
ENH add template_dataset.py
MatDag Oct 17, 2024
c31f5cc
ENH ref to benchopt template
MatDag Oct 17, 2024
931e095
Update README.rst
tomMoral Oct 18, 2024
143ca61
ENH apply suggestion readme
MatDag Oct 18, 2024
481880f
ENH replace rst by md
MatDag Oct 18, 2024
ae2ce5d
FIX brackets
MatDag Oct 18, 2024
b0cb39a
FIX brackets
MatDag Oct 18, 2024
928b70a
FIX brackets
MatDag Oct 18, 2024
a29f393
FIX brackets
MatDag Oct 18, 2024
1038359
FIX brackets
MatDag Oct 18, 2024
50ca4aa
Update README.md
MatDag Oct 18, 2024
cd580c1
Update README.md
MatDag Oct 18, 2024
5f68c11
CLN remove tilde
MatDag Oct 18, 2024
0d1ab03
FIX ref
MatDag Oct 18, 2024
7d9508e
CLN remove useless params
MatDag Oct 18, 2024
59001b3
WIP
MatDag Oct 18, 2024
2fac0da
ENH simplify template_dataset
MatDag Oct 22, 2024
3d7bded
FIX typo
MatDag Oct 22, 2024
10b3bc0
FIX batched_quadratics disappeared in simulated.py...
MatDag Oct 22, 2024
5b80320
CLN remove plot_quadratics.py
MatDag Oct 22, 2024
302d8af
ENH rm generate_matrices
MatDag Oct 23, 2024
a7769ec
CLN docstring
MatDag Oct 23, 2024
7a881a8
FIX flake8
MatDag Oct 23, 2024
9b4c556
ENH callback info template dataset
MatDag Oct 24, 2024
510c812
ENH lr_scheduler template_stochastic_solver
MatDag Oct 24, 2024
e918f5a
ENH add comments oracles
MatDag Nov 21, 2024
f65755e
ENH docstring init
MatDag Nov 21, 2024
24c08fa
ENH docstring get_step
MatDag Nov 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 40 additions & 11 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,40 @@ Bilevel Optimization Benchmark

*Results can be consulted on https://benchopt.github.io/results/benchmark_bilevel.html*

BenchOpt is a package to simplify and make more transparent and
BenchOpt is a package to simplify, and make more transparent, and
MatDag marked this conversation as resolved.
Show resolved Hide resolved
reproducible the comparisons of optimization algorithms.
This benchmark is dedicated to solvers for bilevel optimization:

$$\\min_{x} f(x, z^*(x)) \\quad \\text{with} \\quad z^*(x) = \\arg\\min_z g(x, z), $$

where $g$ and $f$ are two functions of two variables.
where $g$, and $f$ are two functions of two variables.
tomMoral marked this conversation as resolved.
Show resolved Hide resolved

Different problems
------------------

This benchmark currently implements two bilevel optimization problems: regularization selection, and hyper data cleaning.
This benchmark currently implements three bilevel optimization problems: quadratic problem, regularization selection, and hyper data cleaning.

1 - Regularization selection
1 - Simulated bilevel problem
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MatDag marked this conversation as resolved.
Show resolved Hide resolved

In this problem, the inner, and the outer functions are quadritics functions defined of $\\mathbb{R}^{d\\times p}$
MatDag marked this conversation as resolved.
Show resolved Hide resolved

$$g(x, z) = \\frac{1}{n}\\sum_{i=1}^n \\frac{1}{2} z^\\top H_i^z z + \\frac{1}{2} x^\\top H_i^x x + x^\\top C_i z + c_i^\\top z + d_i^\\top x$$

and

$$f(x, z) = \\frac{1}{m} \\sum_{j=1}^m \\frac{1}{2} z^\\top \\tilde H_j^z z + \\frac{1}{2} x^\\top \\tilde H_j^x x + x^\\top \\tilde C_j z + \\tilde c_j^\\top z + \\tilde d_j^\\top x$$

where $H_i^z, \\tilde H_j^z$ are symmetric positive definite matrices of size $p\\times p$, $H_j^x, \\tilde H_j^x$ are symmetric positive definite matrices of size $d\\times d$, $C_i, \\tilde C_j$ are matrices of size $d\\times p$, $c_i$, $\\tilde c_j$ are vectors of size $d$, and $d_i, \\tilde d_j$ are vectors of size $p$.

The matrices $H_i^z, H_i^x, \\tilde H_j^z, \\tilde H_j^x$ are generated randomly such that the eigenvalues of $\\frac1n\\sum_i H_i^z$ are between ``mu_inner``, and ``L_inner_inner``, the eigenvalues of $\\frac1n\\sum_i H_i^x$ are between ``mu_inner``, and ``L_inner_outer``, the eigenvalues of $\\frac1m\\sum_j \\tilde H_j^z$ are between ``mu_inner``, and ``L_outer_inner``, and the eigenvalues of $\\frac1m\\sum_j \\tilde H_j^x$ are between ``mu_inner``, and ``L_outer_outer``.
MatDag marked this conversation as resolved.
Show resolved Hide resolved

The matrices $C_i, \\tilde C_j$ are generated randomly such that the spectral norm of $\\frac1n\\sum_i C_i$ is lower than ``L_cross_inner``, and the spectral norm of $\\frac1m\\sum_j \\tilde C_j$ is lower than ``L_cross_outer``.

Note that in this setting, the solution of the inner problem is a linear system. Moreover, the full batch inner and outer functions can be cheaply computed by storing the average of the Hessian matrices. Thus, the value function can be cheaply evaluated in closed form in medium dimension.
MatDag marked this conversation as resolved.
Show resolved Hide resolved

MatDag marked this conversation as resolved.
Show resolved Hide resolved

2 - Regularization selection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In this problem, the inner function $g$ is defined by
Expand All @@ -41,7 +61,7 @@ Covtype

*Homepage : https://archive.ics.uci.edu/dataset/31/covertype*

This is a logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with $a_i\\in\\mathbb{R}^p$ are the features and $y_i=\\pm1$ is the binary target.
This is a logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with $a_i\\in\\mathbb{R}^p$ are the features, and $y_i=\\pm1$ is the binary target.
MatDag marked this conversation as resolved.
Show resolved Hide resolved
For this problem, the loss is $\\ell(d_i, z) = \\log(1+\\exp(-y_i a_i^T z))$, and the regularization is simply given by
$$\\mathcal{R}(x, z) = \\frac12\\sum_{j=1}^p\\exp(x_j)z_j^2,$$
each coefficient in $z$ is independently regularized with the strength $\\exp(x_j)$.
Expand All @@ -51,18 +71,18 @@ Ijcnn1

*Homepage : https://www.openml.org/search?type=data&sort=runs&id=1575&status=active*

This is a multicalss logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with $a_i\\in\\mathbb{R}^p$ are the features and $y_i\\in \\{1,\\dots, k\\}$ is the integer target, with k the number of classes.
This is a multicalss logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with $a_i\\in\\mathbb{R}^p$ are the features, and $y_i\\in \\{1,\\dots, k\\}$ is the integer target, with k the number of classes.
MatDag marked this conversation as resolved.
Show resolved Hide resolved
For this problem, the loss is $\\ell(d_i, z) = \\text{CrossEntropy}(za_i, y_i)$ where $z$ is now a k x p matrix. The regularization is given by
$$\\mathcal{R}(x, z) = \\frac12\\sum_{j=1}^k\\exp(x_j)\\|z_j\\|^2,$$
each line in $z$ is independently regularized with the strength $\\exp(x_j)$.


2 - Hyper data cleaning
3 - Hyper data cleaning
^^^^^^^^^^^^^^^^^^^^^^^

This problem was first introduced by [Fra2017]_ .
In this problem, the data is the MNIST dataset.
The training set has been corrupted: with a probability $p$, the label of the image $y\\in\\{1,\\dots,10\\}$ is replaced by another random label between 1 and 10.
The training set has been corrupted: with a probability $p$, the label of the image $y\\in\\{1,\\dots,10\\}$ is replaced by another random label between 1, and 10.
MatDag marked this conversation as resolved.
Show resolved Hide resolved
We do not know beforehand which data has been corrupted.
We have a clean testing set, which has not been corrupted.
The goal is to fit a model on the corrupted training data that has good performances on the test set.
Expand Down Expand Up @@ -91,7 +111,7 @@ This benchmark can be run using the following commands:
$ git clone https://github.com/benchopt/benchmark_bilevel
$ benchopt run benchmark_bilevel

Apart from the problem, options can be passed to `benchopt run`, to restrict the benchmarks to some solvers or datasets, e.g.:
Apart from the problem, options can be passed to ``benchopt run``, to restrict the benchmarks to some solvers or datasets, e.g.:

.. code-block::

Expand All @@ -103,10 +123,19 @@ You can also use config files to setup the benchmark run:

$ benchopt run benchmark_bilevel --config config/X.yml

where `X.yml` is a config file. See https://benchopt.github.io/index.html#run-a-benchmark for an example of a config file. This will possibly launch a huge grid search. When available, you can rather use the file `X_best_params.yml` in order to launch an experiment with a single set of parameters for each solver.
where ``X.yml`` is a config file. See https://benchopt.github.io/index.html#run-a-benchmark for an example of a config file. This will possibly launch a huge grid search. When available, you can rather use the file ``X_best_params.yml`` in order to launch an experiment with a single set of parameters for each solver.

Use `benchopt run -h` for more details about these options, or visit https://benchopt.github.io/api.html.
Use ``benchopt run -h`` for more details about these options, or visit https://benchopt.github.io/api.html.

How to contribute to the benchmark?
-----------------------------------

If you think that a solver is missing, or if you want to add a new problem, feel free to open a pull request or an issue!
MatDag marked this conversation as resolved.
Show resolved Hide resolved

1 - How to add a new solvers?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Stochastic solver: see the detailed explanations in the [AmIGO solver](solvers/amigo.py).
* Other solver: see the detailed explanation in the [Benchopt documentation](https://benchopt.github.io/tutorials/add_solver.html).

Cite
----
Expand Down
26 changes: 22 additions & 4 deletions benchmark_utils/stochastic_jax_solver.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,28 @@ def set_objective(self, f_inner, f_outer, n_inner_samples, n_outer_samples,

inner_var0, outer_var0: array-like, shape (dim_inner,) (dim_outer,)

f_inner_fb, f_outer_fb: callable
Full batch version of f_inner and f_outer. Should take as input:
* inner_var: array-like, shape (dim_inner,)
* outer_var: array-like, shape (dim_outer,)
Attributes
----------
f_inner, f_outer: callable
Inner and outer objective function for the bilevel optimization
problem.

n_inner_samples, n_outer_samples: int
Number of samples to draw for the inner and outer objective
functions.

inner_var0, outer_var0: array-like, shape (dim_inner,) (dim_outer,)

batch_size_inner, batch_size_outer: int
Size of the minibatch to use for the inner and outer objective
functions.

state_inner_sampler, state_outer_sampler: dict
State of the minibatch samplers for the inner and outer objectives.

one_epoch: callable
Jitted function that runs the solver for one epoch. One epoch is
defined as `eval_freq` iterations of the solver.
"""

self.f_inner = f_inner
Expand Down
4 changes: 2 additions & 2 deletions config/quadratics_021424_best_params.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ objective:
dataset:
- quadratic[L_cross_inner=0.1,L_cross_outer=0.1,mu_inner=[.1],n_samples_inner=[32768],n_samples_outer=[1024],dim_inner=100,dim_outer=10]
solver:
- AmIGO[batch_size=64,eval_freq=16,framework=none,n_inner_steps=10,outer_ratio=1.0,step_size=0.01,random_state=[1,2,3,4,5,6,7,8,9,10]]
- AmIGO[batch_size=64,eval_freq=16,framework=none,n_inner_steps=10,outer_ratio=0.1,step_size=0.01,random_state=[1,2,3,4,5,6,7,8,9,10]]
- MRBO[batch_size=64,eta=0.5,eval_freq=16,framework=none,n_shia_steps=10,outer_ratio=0.1,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- SABA[batch_size=64,eval_freq=64,framework=none,mode_init_memory=zero,outer_ratio=1.0,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- SRBA[batch_size=64,eval_freq=64,framework=none,outer_ratio=0.1,period_frac=0.5,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- SRBA[batch_size=64,eval_freq=64,framework=none,outer_ratio=1.0,period_frac=0.5,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- StocBiO[batch_size=64,eval_freq=16,framework=none,n_inner_steps=10,n_shia_steps=10,outer_ratio=1.0,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- VRBO[batch_size=64,eval_freq=2,framework=none,n_inner_steps=10,n_shia_steps=10,outer_ratio=1.0,period_frac=0.01,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
- F2SA[batch_size=64,delta_lmbda=0.01,eval_freq=16,framework=none,lmbda0=1,n_inner_steps=10,outer_ratio=1.0,step_size=0.1,random_state=[1,2,3,4,5,6,7,8,9,10]]
Expand Down
201 changes: 201 additions & 0 deletions figures/plot_quadratics.py
tomMoral marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.rc('text', usetex=True)

FILE_NAME = Path(__file__).with_suffix('')
METRIC = 'objective_value'

# DEFAULT_WIDTH = 3.25
DEFAULT_WIDTH = 3
DEFAULT_HEIGHT = 2
LEGEND_RATIO = 0.1

N_POINTS = 500
X_LIM = 250

# Utils to get common STYLES object and setup matplotlib
# for all plots

mpl.rcParams.update({
'font.size': 10,
'legend.fontsize': 'small',
'axes.labelsize': 'small',
'xtick.labelsize': 'small',
'ytick.labelsize': 'small'
})

STYLES = {
'*': dict(lw=1.5),

'amigo': dict(color='#5778a4', label=r'AmIGO'),
'mrbo': dict(color='#e49444', label=r'MRBO'),
'vrbo': dict(color='#e7ca60', label=r'VRBO'),
'saba': dict(color='#d1615d', label=r'SABA'),
'stocbio': dict(color='#85b6b2', label=r'StocBiO'),
'srba': dict(color='#6a9f58', label=r'\textbf{SRBA}', lw=2),
'f2sa': dict(color='#bcbd22', label=r'F2SA'),
}


def get_param(name, param='period_frac'):
params = {}
for vals in name.split("[", maxsplit=1)[1][:-1].split(","):
k, v = vals.split("=")
if v.replace(".", "").isnumeric():
params[k] = float(v)
else:
params[k] = v
return params[param]


def drop_param(name, param='period_frac'):
new_name = name.split("[", maxsplit=1)[0] + '['
for vals in name.split("[", maxsplit=1)[1][:-1].split(","):
k, v = vals.split("=")
if k != param:
new_name += f'{k}={v},'
return new_name[:-1] + ']'


if __name__ == "__main__":
fname = "quadratic.parquet"
fname = FILE_NAME.parent / fname

if Path(f'{fname.stem}_stable.parquet').is_file():
df = pd.read_parquet(f'{fname.stem}_stable.parquet')
print(f'{fname.stem}_stable.parquet')
else:
df = pd.read_parquet(fname)
print(fname)

# normalize names
df['solver'] = df['solver_name'].apply(
lambda x: x.split('[')[0].lower()
)
df['seed_solver'] = df['solver_name'].apply(
lambda x: get_param(x, 'random_state')
)
df['seed_data'] = df['data_name'].apply(
lambda x: get_param(x, 'random_state')
)

df['solver_name'] = df['solver_name'].apply(
lambda x: drop_param(x, 'random_state')
)
df['data_name'] = df['data_name'].apply(
lambda x: drop_param(x, 'random_state')
)
df['cond'] = df['data_name'].apply(
lambda x: get_param(x, 'L_inner_inner')/get_param(x, 'mu_inner')
)
df['n_inner'] = df['data_name'].apply(
lambda x: get_param(x, 'n_samples_inner')
)
df['n_outer'] = df['data_name'].apply(
lambda x: get_param(x, 'n_samples_outer')
)
df['n_tot'] = df['n_inner'] + df['n_outer']

# keep only runs all the random seeds
df['full'] = False
n_seeds = df.groupby('solver_name')['seed_data'].nunique()
n_seeds *= df.groupby('solver_name')['seed_solver'].nunique()
for s in n_seeds.index:
if n_seeds[s] == 10:
df.loc[df['solver_name'] == s, 'full'] = True
df = df.query('full == True')
df.to_parquet(f'{fname.stem}_stable.parquet')

fig = plt.figure(
figsize=(DEFAULT_WIDTH, DEFAULT_HEIGHT * (1 + LEGEND_RATIO))
)

gs = plt.GridSpec(
len(df['n_tot'].unique()), len(df['cond'].unique()),
height_ratios=[1] * len(df['n_tot'].unique()),
width_ratios=[1] * len(df['cond'].unique()),
hspace=0.5, wspace=0.3
)

lines = []
for i, n_tot in enumerate(df['n_tot'].unique()):
for j, cond in enumerate(df['cond'].unique()):
df_pb = df.query("cond == @cond & n_tot == @n_tot")
print(f"Cond: {cond}, n: {df_pb['n_inner'].iloc[0]}, "
+ f"m: {df_pb['n_outer'].iloc[0]}")
to_plot = (
df.query("cond == @cond & n_tot == @n_tot & stop_val <= 100")
.groupby(['solver', 'solver_name', 'data_name', 'stop_val'])
.median(METRIC)
.reset_index().sort_values(METRIC)
.groupby('solver').first()[['solver_name']]
)
(
df.query("solver_name in @to_plot.values.ravel()")
.to_parquet(f'{fname.stem}_best_params.parquet')
)
print("Chosen parameters:")
for s in to_plot['solver_name']:
print(f"- {s}")
ax = fig.add_subplot(gs[i, j])
for solver_name in to_plot['solver_name']:
df_solver = df_pb.query("solver_name == @solver_name")
solver = df_solver['solver'].iloc[0]
style = STYLES['*'].copy()
style.update(STYLES[solver])
curves = [data[['time', METRIC]].values
for _, data in df_solver.groupby(['seed_data',
'seed_solver'])]
vals = [c[:, 1] for c in curves]
times = [c[:, 0] for c in curves]
tmin = np.min([np.min(t) for t in times])
tmax = np.max([np.max(t) for t in times])
time_grid = np.linspace(np.log(tmin), np.log(tmax + 1),
N_POINTS)
interp_vals = np.zeros((len(times), N_POINTS))
for k, (t, val) in enumerate(zip(times, vals)):
interp_vals[k] = np.exp(np.interp(time_grid, np.log(t),
np.log(val)))
time_grid = np.exp(time_grid)
medval = np.quantile(interp_vals, .5, axis=0)
q1 = np.quantile(interp_vals, .2, axis=0)
q2 = np.quantile(interp_vals, .8, axis=0)
if i == 0 and j == 0:
lines.append(ax.semilogy(
time_grid, np.sqrt(medval),
**style
)[0])
else:
ax.semilogy(
time_grid, np.sqrt(medval),
**style
)
ax.fill_between(
time_grid,
np.sqrt(q1),
np.sqrt(q2),
color=style['color'], alpha=0.3
)
ax.set_xlabel('Time (s)')
ax.set_ylabel(r'$\|\nabla h(x^t)\|$')
print(f"Min score ({solver}):", df_solver[METRIC].min())
ax.grid()
ax.set_xlim([0, X_LIM])

if i == 0 and j == 0:
ax_legend = ax.legend(
handles=lines,
ncol=2,
prop={'size': 6.5}
)
print(f"Saving {fname.with_suffix('.pdf')}")
fig.savefig(
fname.with_suffix('.pdf'),
bbox_inches='tight',
bbox_extra_artists=[ax_legend]
)
Loading
Loading