Let us assume that you have defined an objective function as in:
def myfunction(lr, num_layers, arg3, arg4, other_anything):
...
return -accuracy # something to minimize
You should define how it must be instrumented, i.e. what are the arguments you want to optimize upon, and on which space they are defined. If you have both continuous and discrete parameters, you have a good initial guess, maybe just use OrderedDiscrete
for all discrete variables (yes, even if they are not ordered), Array
for all your continuous variables, and use PortfolioDiscreteOnePlusOne
as optimizer.
import nevergrad as ng
# instrument learning rate and number of layers, keep arg3 to 3 and arg4 to 4
lr = ng.var.Array(1).asscalar().bounded(0, 3).exponentiated(base=10, coeff=-1) # log distributed between 0.001 and 1
num_layers = ng.var.OrderedDiscrete([4, 5, 6])
instrumentation = ng.Instrumentation(lr, num_layers, 3., arg4=4)
Just take care that the default value (your initial guess) is at the middle in the list of possible values for OrderedDiscrete
, and 0 for Array
(you can modify this with Array
methods). You can check that things are correct by checking that for zero you get the default:
args, kwargs = instrumentation.data_to_arguments([0] * instrumentation.dimension)
print(args, kwargs)
The fact that you use ordered discrete variables is not a big deal because by nature PortfolioDiscreteOnePlusOne
will ignore the order. This algorithm is quite stable.
If you have more budget, a cool possibility is to use CategoricalSoftmax
for all discrete variables and then apply TwoPointsDE
. You might also compare this to DE
(classical differential evolution). This might need a budget in the hundreds.
If you want to double-check that you are not worse than random search, you might use RandomSearch
.
If you want something fully parallel (the number of workers can be equal to the budget), then you might use ScrHammersleySearch
, which includes the discrete case. Then, you should use OrderedDiscrete
rather than CategoricalSoftmax
. This does not have the traditional drawback of grid search and should still be more uniform than random. By nature ScrHammersleySearch
will deal correctly with OrderedDiscrete
type for discrete variables.
If you are optimizing weights in reinforcement learning, you might use TBPSA
(high noise) or CMA
(low noise).
The first example is simply the optimization of continuous hyperparameters. It is also presented in an asynchronous setting. All other examples are based on the ask and tell interface, which can be synchronous or not but relies on the user for setting up asynchronicity.
The second example is the optimization of mixed (continuous and discrete) hyperparameters.
The third example is the optimization of parameters in a noisy setting, typically as in reinforcement learning.
Let's first define our test case:
import nevergrad as ng
import numpy as np
# Optimization of continuous hyperparameters.
print("Optimization of continuous hyperparameters =========")
def train_and_return_test_error(x):
return np.linalg.norm([int(50. * abs(x_ - 0.2)) for x_ in x])
instrumentation = ng.Instrumentation(ng.var.Array(300)) # optimize on R^300
budget = 1200 # How many trainings we will do before concluding.
names = ["RandomSearch", "TwoPointsDE", "CMA", "PSO", "ScrHammersleySearch"]
We will compare several algorithms (defined in names
).
RandomSearch
is well known, ScrHammersleySearch
is a quasirandom; these two methods
are fully parallel, i.e. we can perform the 1200 trainings in parallel.
CMA
and PSO
are classical optimization algorithms, and TwoPointsDE
is Differential Evolution equipped with a 2-points crossover.
A complete list is available in ng.optimizers.registry
.
for name in names:
optim = ng.optimizers.registry[name](instrumentation=instrumentation, budget=budget)
for u in range(budget // 3):
x1 = optim.ask()
# Ask and tell can be asynchronous.
# Just be careful that you "tell" something that was asked.
# Here we ask 3 times and tell 3 times in order to fake asynchronicity
x2 = optim.ask()
x3 = optim.ask()
# The three folowing lines could be parallelized.
# We could also do things asynchronously, i.e. do one more ask
# as soon as a training is over.
y1 = train_and_return_test_error(*x1.args, **x1.kwargs) # here we only defined an arg, so we could omit kwargs
y2 = train_and_return_test_error(*x2.args, **x2.kwargs) # (keeping it here for the sake of consistency)
y3 = train_and_return_test_error(*x3.args, **x3.kwargs)
optim.tell(x1, y1)
optim.tell(x2, y2)
optim.tell(x3, y3)
recommendation = optim.recommend()
print("* ", name, " provides a vector of parameters with test error ",
train_and_return_test_error(*recommendation.args, **recommendation.kwargs))
from concurrent import futures
for name in names:
optim = np.optimizers.registry[name](instrumentation=instrumentation, budget=budget)
with futures.ThreadPoolExecutor(max_workers=optim.num_workers) as executor: # the executor will evaluate the function in multiple threads
recommendation = optim.minimize(train_and_return_test_error, executor=executor)
print("* ", name, " provides a vector of parameters with test error ",
train_and_return_test_error(recommendation))
Let's define our function:
import numpy as np
# Let us define a function.
def myfunction(arg1, arg2, arg3, value=3):
return np.abs(value) + (1 if arg1 != "a" else 0) + (1 if arg2 != "e" else 0)
This function must then be instrumented in order to let the optimizer now what are the arguments:
import nevergrad as ng
# argument transformation
# Optimization of mixed (continuous and discrete) hyperparameters.
arg1 = ng.var.OrderedDiscrete(["a", "b"]) # 1st arg. = positional discrete argument
# We apply a softmax for converting real numbers to discrete values.
arg2 = ng.var.SoftmaxCategorical(["a", "c", "e"]) # 2nd arg. = positional discrete argument
value = ng.var.Gaussian(mean=1, std=2) # the 4th arg. is a keyword argument with Gaussian prior
# create the instrumentation
# the 3rd arg. is a positional arg. which will be kept constant to "blublu"
instrumentation = ng.Instrumentation(arg1, arg2, "blublu", value=value)
print(instrumentation.dimension) # 5 dimensional space
The dimension is 5 because:
- the 1st discrete var. has 1 possible values, represented by a hard thresholding in a 1-dimensional space, i.e. we add 1 coordinate to the continuous problem
- the 2nd discrete var. has 3 possible values, represented by softmax, i.e. we add 3 coordinates to the continuous problem
- the 3rd var. has no uncertainty, so it does not introduce any coordinate in the continuous problem
- the 4th var. is a real number, represented by single coordinate.
args, kwargs = instrumentation.data_to_arguments([1, -80, -80, 80, 3])
print(args, kwargs)
>>> ('b', 'e', 'blublu') {'value': 7}
myfunction(*args, **kwargs)
>>> 8
In this case:
args[0] == "b"
because 1 > 0 (the threshold is 0 here since there are 2 values.args[1] == "e"
is selected because proba(e) = exp(80) / (exp(80) + exp(-80) + exp(-80)) = 1args[2] == "blublu"
because it is kept constantvalue == 7
because std * 3 + mean = 2 * 3 + 1 = 7 The function therefore returns 7 + 1 = 8.
Then you can run the optimization as usual. PortfolioDiscreteOnePlusOne is quite a natural choice when you have a good initial guess and a mix of discrete and continuous variables; in this case, it might be better to use OrderedDiscrete
rather than SoftmaxCategorical
.
TwoPointsDE
is often excellent in the large scale case (budget in the hundreds).
import nevergrad as ng
budget = 1200 # How many episode we will do before concluding.
for name in ["RandomSearch", "ScrHammersleySearch", "TwoPointsDE", "PortfolioDiscreteOnePlusOne", "CMA", "PSO"]:
optim = ng.optimizers.registry[name](instrumentation=instrumentation, budget=budget)
for u in range(budget // 3):
x1 = optim.ask()
# Ask and tell can be asynchronous.
# Just be careful that you "tell" something that was asked.
# Here we ask 3 times and tell 3 times in order to fake asynchronicity
x2 = optim.ask()
x3 = optim.ask()
# The three folowing lines could be parallelized.
# We could also do things asynchronously, i.e. do one more ask
# as soon as a training is over.
y1 = myfunction(*x1.args, **x1.kwargs) # here we only defined an arg, so we could omit kwargs
y2 = myfunction(*x2.args, **x2.kwargs) # (keeping it here for the sake of consistency)
y3 = myfunction(*x3.args, **x3.kwargs)
optim.tell(x1, y1)
optim.tell(x2, y2)
optim.tell(x3, y3)
recommendation = optim.recommend()
print("* ", name, " provides a vector of parameters with test error ",
myfunction(*recommendation.args, **recommendation.kwargs))
You always have the possibility to define your own instrumentation inside your function (not recommended):
def softmax(x, possible_values=None):
expx = [np.exp(x_ - max(x)) for x_ in x]
probas = [e / sum(expx) for e in expx]
return np.random.choice(len(x) if possible_values is None
else possible_values, size=1, p=probas)
def train_and_return_test_error_mixed(x):
cx = [x_ - 0.1 for x_ in x[3:]]
activation = softmax(x[:3], ["tanh", "sigmoid", "relu"])
return np.linalg.norm(cx) + (1. if activation != "tanh" else 0.)
instrumentation = 10 # you can just provide the size of your input in this case
#This version is bigger.
def train_and_return_test_error_mixed(x):
cx = x[:(len(x) // 2)] # continuous part.
presoftmax_values = x[(len(x) // 2):] # discrete part.
values_for_this_softmax = []
dx = []
for g in presoftmax:
values_for_this_softmax += [g]
if len(values_for_this_softmax) > 4:
dx += softmax(values_for_this_softmax)
values_for_this_softmax = []
return np.linalg.norm([int(50. * abs(x_ - 0.2)) for x_ in cx]) + [
1 if d != 1 else 0 for d in dx]
instrumentation = 300
We do not average evaluations over multiple episodes - the algorithm is in charge of averaging, if need be.
TBPSA
, based on population-control mechanisms, performs quite well in this case.
import nevergrad as ng
import numpy as np
# Similar, but with a noisy case: typically a case in which we train in reinforcement learning.
# This is about parameters rather than hyperparameters. TBPSA is a strong candidate in this case.
# We do *not* manually average over multiple evaluations; the algorithm will take care
# of averaging or reevaluate whatever it wants to reevaluate.
print("Optimization of parameters in reinforcement learning ===============")
def simulate_and_return_test_error_with_rl(x, noisy=True):
return np.linalg.norm([int(50. * abs(x_ - 0.2)) for x_ in x]) + noisy * len(x) * np.random.normal()
budget = 1200 # How many trainings we will do before concluding.
for tool in ["TwoPointsDE", "RandomSearch", "TBPSA", "CMA", "NaiveTBPSA",
"PortfolioNoisyDiscreteOnePlusOne"]:
optim = ng.optimizers.registry[tool](instrumentation=300, budget=budget)
for u in range(budget // 3):
# Ask and tell can be asynchronous.
# Just be careful that you "tell" something that was asked.
# Here we ask 3 times and tell 3 times in order to fake asynchronicity
x1 = optim.ask()
x2 = optim.ask()
x3 = optim.ask()
# The three folowing lines could be parallelized.
# We could also do things asynchronously, i.e. do one more ask
# as soon as a training is over.
y1 = simulate_and_return_test_error_with_rl(*x1.args)
y2 = simulate_and_return_test_error_with_rl(*x2.args)
y3 = simulate_and_return_test_error_with_rl(*x3.args)
optim.tell(x1, y1)
optim.tell(x2, y2)
optim.tell(x3, y3)
recommendation = optim.recommend()
print("* ", tool, " provides a vector of parameters with test error ",
simulate_and_return_test_error_with_rl(*recommendation.args, noisy=False))