-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pareto optimization #475
base: main
Are you sure you want to change the base?
Pareto optimization #475
Conversation
385bebe
to
e633f92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first round of comments
|
||
abbreviation: ClassVar[str] = "qLogNEHVI" | ||
|
||
ref_point: float | tuple[float, ...] | None = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to include the prune_baseline
keyword? Sounds useful, or could always be set to true depending on our preference
any merit to include some of the other keywords that might be useful? thinking of eta
, alpha
or fat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, why not, let's agree on a subset. I'd say let's include prune_baseline
with True
as default. But I would not include anything that we don't yet fully understand ourselves / stuff that does not primarily affect the optimization. So if you ask me, I'd leave it with that. Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at our implementation of X_baseline
we definitley need the pruning to be true as we dont do any pre-selection, perhaps theres no need to make it configurable tbh.
This made me look up what is done for the noisy non HV variante qNEI
because we have that included and also set baseline to jsut all train data. Strangely, there the default for prune_baseline
is True
while the HI variant here has it set to False
. So I would ensure its True
by hardcoding it base.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other parameters, I think the only one I would possibly provide access to is alpha
. If the other HV variants are also included they need an alpha passed to partitioning
.
To simplify thing we could also not make alpha
configurable but set the value according to a heuristic, it seems it should be 0.0 for m<=5, then we could add a linear increase until m=20 or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then I think it's more elegant to just add it to our qLogNEHVI wrapper with default value True
just like we did for the scalar version. That way, we have a consistent useful default while still being configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely lets use this
so alpha
becomes a property and not an attribute right?
@@ -61,6 +63,8 @@ | |||
"qUpperConfidenceBound", | |||
# Thompson Sampling | |||
"qThompsonSampling", | |||
# Hypervolume Improvement | |||
"qLogNoisyExpectedHypervolumeImprovement", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not include qExpectedHypervolumeImprovement
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are in fact 4 choices: w/o log
and w/o noisy
. The reason why I haven't included the non-noisy versions yet is because I haven't really gotten into the partition mechanics required for those. Do you already have some insights to share here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why this matters? Is the implementation here requiring anything different for eg just swapping one of the other functions in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it requires passing an explicit partition object. Probably not a big deal, though, just haven't had the time yet to fully understand the underlying mechanism. I guess this is analog to the 1D case where for the regular EI you pass best_f but for the noisy version you don't. In that sense, the partition would act like the multidimensional generalization of best_f. Whoever of us gets there first can add the logic 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok didnt realize that
i dont understand yet why the other methods require a partitioning
but there are exact and approximate utilities that essentially only depend on ref_point
and Y
so in principle there should be no obstacle to compute a property that provides partitioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it further seems that the interface differenes might be due to legacy things or so, you will find for the noisy variant the alpha
parameter
which has the same role as the alpha parameter for the partitioning utility. So it appears there the partitioning is just done internally, which imo would justify to just hardcode partitioning
to be e.g. FastNondominatedPartitioning
examples/Multi_Target/pareto.py
Outdated
searchspace=searchspace, | ||
objective=ParetoObjective([y0, y1]), | ||
recommender=BotorchRecommender( | ||
acquisition_function=qLogNEHVI(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have the option to make a HVI based acqf the default in case there is a multi output objective?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder. Actually wanted to do this but forgot. Now it's included
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is though what should be the actual defaults, perhaps we should discuss this. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in analogy to our single task default the default here should be the non-noisy log variant
but tbh I dont care if theres evidence or practical preference for another one
we could also debate whether the single task default should be the noisy log variant, but this eventually becomes a question like with the priors where we should look at the outcome on our benchmarks. So tha decision can be postponed until thats ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm ok
fine if we dont have consistent defaults at the moment, as I said, I'd investigate and potentially change this after the benchmarking is more complete. I got at least one colleague already telling me some time ago that noisy EI delivered better results, so the hope would be that its also better across the board there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detached comment2: no particular tests? the pareto objective is not tested. No hypothesis for it. No integrative tests like iterations (unless being done automatically but I dont think we have tests that iterate over objective types.
189bbe5
to
614a660
Compare
614a660
to
758fb7d
Compare
Co-authored-by: Martin Fitzner <[email protected]>
Co-authored-by: Martin Fitzner <[email protected]>
The ref_point is now in the original target space so that the user can intuitively specify its coordinates. Sign flips for minimization targets happen behind the scenes.
27ae08c
to
711935f
Compare
weights. By contrast, the Pareto approach allows to specify this trade-off | ||
*after* the experiments have been carried out, giving the user the flexibly to adjust | ||
their preferences post-hoc – knowing that each of the obtained points is optimal | ||
with respect to a particular preference model. In this sense, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would drop the last sentence doesnt seem necessary / a bit opinionated
target_2 = NumericalTarget(name="t_2", mode="MAX") | ||
objective = ParetoObjective(targets=[target_1, target_2]) | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo its important that this restricts acqf choices
So I would cross ref here that the optimzation is acheived bia special acqfs, linking to the autodoc or later on to the acqf user guide
This PR finally brings Pareto optimization via the new
ParetoObjective
class, together with an example comparing Pareto vs. single target optimization for a simple pair of synthetic targets.Note: Support is currently limited to maximization and minimization targets. Match targets will follow but require a refactoring of the corresponding target transformation mechanism.