Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pareto optimization #475

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open

Pareto optimization #475

wants to merge 29 commits into from

Conversation

AdrianSosic
Copy link
Collaborator

This PR finally brings Pareto optimization via the new ParetoObjective class, together with an example comparing Pareto vs. single target optimization for a simple pair of synthetic targets.

Note: Support is currently limited to maximization and minimization targets. Match targets will follow but require a refactoring of the corresponding target transformation mechanism.

@AdrianSosic AdrianSosic added the new feature New functionality label Feb 3, 2025
@AdrianSosic AdrianSosic self-assigned this Feb 3, 2025
README.md Show resolved Hide resolved
examples/Multi_Target/pareto.py Outdated Show resolved Hide resolved
@AdrianSosic AdrianSosic force-pushed the feature/pareto branch 2 times, most recently from 385bebe to e633f92 Compare February 3, 2025 14:08
Copy link
Collaborator

@Scienfitz Scienfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first round of comments

CHANGELOG.md Outdated Show resolved Hide resolved

abbreviation: ClassVar[str] = "qLogNEHVI"

ref_point: float | tuple[float, ...] | None = field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to include the prune_baseline keyword? Sounds useful, or could always be set to true depending on our preference

any merit to include some of the other keywords that might be useful? thinking of eta, alpha or fat

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, why not, let's agree on a subset. I'd say let's include prune_baseline with True as default. But I would not include anything that we don't yet fully understand ourselves / stuff that does not primarily affect the optimization. So if you ask me, I'd leave it with that. Opinions?

Copy link
Collaborator

@Scienfitz Scienfitz Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at our implementation of X_baseline
image
we definitley need the pruning to be true as we dont do any pre-selection, perhaps theres no need to make it configurable tbh.

This made me look up what is done for the noisy non HV variante qNEI because we have that included and also set baseline to jsut all train data. Strangely, there the default for prune_baseline is True while the HI variant here has it set to False. So I would ensure its True by hardcoding it base.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other parameters, I think the only one I would possibly provide access to is alpha. If the other HV variants are also included they need an alpha passed to partitioning.

To simplify thing we could also not make alpha configurable but set the value according to a heuristic, it seems it should be 0.0 for m<=5, then we could add a linear increase until m=20 or so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then I think it's more elegant to just add it to our qLogNEHVI wrapper with default value True just like we did for the scalar version. That way, we have a consistent useful default while still being configurable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, botorch already has the following built-in logic (I guess this what you are referring to)? So if we want to use a smart non-static default, I'd rather go with just calling their internal logic instead of coding on ourselves?

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely lets use this
so alpha becomes a property and not an attribute right?

baybe/acquisition/acqfs.py Show resolved Hide resolved
@@ -61,6 +63,8 @@
"qUpperConfidenceBound",
# Thompson Sampling
"qThompsonSampling",
# Hypervolume Improvement
"qLogNoisyExpectedHypervolumeImprovement",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not include qExpectedHypervolumeImprovement too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are in fact 4 choices: w/o log and w/o noisy. The reason why I haven't included the non-noisy versions yet is because I haven't really gotten into the partition mechanics required for those. Do you already have some insights to share here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why this matters? Is the implementation here requiring anything different for eg just swapping one of the other functions in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it requires passing an explicit partition object. Probably not a big deal, though, just haven't had the time yet to fully understand the underlying mechanism. I guess this is analog to the 1D case where for the regular EI you pass best_f but for the noisy version you don't. In that sense, the partition would act like the multidimensional generalization of best_f. Whoever of us gets there first can add the logic 👍🏼

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok didnt realize that
i dont understand yet why the other methods require a partitioning but there are exact and approximate utilities that essentially only depend on ref_point and Y so in principle there should be no obstacle to compute a property that provides partitioning

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it further seems that the interface differenes might be due to legacy things or so, you will find for the noisy variant the alpha parameter
image
which has the same role as the alpha parameter for the partitioning utility. So it appears there the partitioning is just done internally, which imo would justify to just hardcode partitioning to be e.g. FastNondominatedPartitioning

README.md Show resolved Hide resolved
searchspace=searchspace,
objective=ParetoObjective([y0, y1]),
recommender=BotorchRecommender(
acquisition_function=qLogNEHVI(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have the option to make a HVI based acqf the default in case there is a multi output objective?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder. Actually wanted to do this but forgot. Now it's included

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is though what should be the actual defaults, perhaps we should discuss this. What do you think?

Copy link
Collaborator

@Scienfitz Scienfitz Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in analogy to our single task default the default here should be the non-noisy log variant
but tbh I dont care if theres evidence or practical preference for another one

we could also debate whether the single task default should be the noisy log variant, but this eventually becomes a question like with the priors where we should look at the outcome on our benchmarks. So tha decision can be postponed until thats ready

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would have argued in the same way, but this is what is stated in one of the botorch examples:
image
I wonder why the same doesn't apply to the 1-D case though 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ok

fine if we dont have consistent defaults at the moment, as I said, I'd investigate and potentially change this after the benchmarking is more complete. I got at least one colleague already telling me some time ago that noisy EI delivered better results, so the hope would be that its also better across the board there.

examples/Multi_Target/pareto.py Outdated Show resolved Hide resolved
examples/Multi_Target/pareto.py Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detached comment2: no particular tests? the pareto objective is not tested. No hypothesis for it. No integrative tests like iterations (unless being done automatically but I dont think we have tests that iterate over objective types.

@AdrianSosic AdrianSosic force-pushed the feature/pareto branch 3 times, most recently from 189bbe5 to 614a660 Compare February 13, 2025 09:52
weights. By contrast, the Pareto approach allows to specify this trade-off
*after* the experiments have been carried out, giving the user the flexibly to adjust
their preferences post-hoc – knowing that each of the obtained points is optimal
with respect to a particular preference model. In this sense, the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would drop the last sentence doesnt seem necessary / a bit opinionated

target_2 = NumericalTarget(name="t_2", mode="MAX")
objective = ParetoObjective(targets=[target_1, target_2])
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo its important that this restricts acqf choices

So I would cross ref here that the optimzation is acheived bia special acqfs, linking to the autodoc or later on to the acqf user guide

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants