Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pareto optimization #475
base: main
Are you sure you want to change the base?
Pareto optimization #475
Changes from all commits
bd6f025
cdfb622
6c74060
abced3a
d72f076
2e6fb02
842aee6
555aeea
9613050
0df7ce4
a011533
a65fa51
7b93127
3adfb75
926a880
40c6d54
0f5d823
02b56e7
c4fbe8b
3aa72dd
b81af42
63efab3
49e9556
88240ba
6ccdca4
8362904
9281087
3f7a7c6
711935f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detached comment2: no particular tests? the pareto objective is not tested. No hypothesis for it. No integrative tests like iterations (unless being done automatically but I dont think we have tests that iterate over objective types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not include
qExpectedHypervolumeImprovement
too?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are in fact 4 choices: w/o
log
and w/onoisy
. The reason why I haven't included the non-noisy versions yet is because I haven't really gotten into the partition mechanics required for those. Do you already have some insights to share here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why this matters? Is the implementation here requiring anything different for eg just swapping one of the other functions in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it requires passing an explicit partition object. Probably not a big deal, though, just haven't had the time yet to fully understand the underlying mechanism. I guess this is analog to the 1D case where for the regular EI you pass best_f but for the noisy version you don't. In that sense, the partition would act like the multidimensional generalization of best_f. Whoever of us gets there first can add the logic 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok didnt realize that
i dont understand yet why the other methods require a
partitioning
but there are exact and approximate utilities that essentially only depend onref_point
andY
so in principle there should be no obstacle to compute a property that providespartitioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it further seems that the interface differenes might be due to legacy things or so, you will find for the noisy variant the
![image](https://private-user-images.githubusercontent.com/17951239/412818156-eebcc3ec-fda6-4c5d-b2f9-3609f7878c33.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NTk0MDYsIm5iZiI6MTczOTY1OTEwNiwicGF0aCI6Ii8xNzk1MTIzOS80MTI4MTgxNTYtZWViY2MzZWMtZmRhNi00YzVkLWIyZjktMzYwOWY3ODc4YzMzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDIyMzgyNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI2NmIxYzA0ZGZkNjliYjhhOWY4ZDc2YjIzY2ExMzQyZTc3MTYxYmZkZGY5YThiMWYwMTVlZWQ5NjdiNjE3MjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.7Rvdow0xWuWP_xn4eD_u0onNHtkBXThN_UOfrXEbubE)
alpha
parameterwhich has the same role as the alpha parameter for the partitioning utility. So it appears there the partitioning is just done internally, which imo would justify to just hardcode
partitioning
to be e.g.FastNondominatedPartitioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have no mechanism to distinguish that this is a multi-output acqf, do we need this:?
I think this acqf could also work in 1D, is that the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there's no such mechanism at the moment. I'm not even sure if it exists in botorch / what would happen if you create an incompatible setting there. will need to check. But it definitely makes sense to somehow be able for us to validate the input and throw a correpsonding user exception. How do we go about it? Class binary flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well first we need to know whether the HVI based acqfs also support m=1 ?If so they are available for all existing iteration tests and wed just have to create maybe a new one for multiple targets
in any case, a new property
supports_multi_output
or similar wouldnt hurt? so far it will only be acqfs withHypervolume
in their nameThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to include the
prune_baseline
keyword? Sounds useful, or could always be set to true depending on our preferenceany merit to include some of the other keywords that might be useful? thinking of
eta
,alpha
orfat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, why not, let's agree on a subset. I'd say let's include
prune_baseline
withTrue
as default. But I would not include anything that we don't yet fully understand ourselves / stuff that does not primarily affect the optimization. So if you ask me, I'd leave it with that. Opinions?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at our implementation of
![image](https://private-user-images.githubusercontent.com/17951239/412829065-db44403a-9b8f-4b1a-aae4-b1f585fc83c7.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NTk0MDYsIm5iZiI6MTczOTY1OTEwNiwicGF0aCI6Ii8xNzk1MTIzOS80MTI4MjkwNjUtZGI0NDQwM2EtOWI4Zi00YjFhLWFhZTQtYjFmNTg1ZmM4M2M3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDIyMzgyNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJiNGM4ZDY1M2FiMjljNmEwNDA5YWYzOTcxZDRhY2E4NjA1NjIyOWEzZTFmNDY5MjlmZmVhZTE0Y2UxOTU4NzgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UFdKrcqQFs6GJaSWBx6_SyXlMO6jiJ9R2yhaaxe256Y)
X_baseline
we definitley need the pruning to be true as we dont do any pre-selection, perhaps theres no need to make it configurable tbh.
This made me look up what is done for the noisy non HV variante
qNEI
because we have that included and also set baseline to jsut all train data. Strangely, there the default forprune_baseline
isTrue
while the HI variant here has it set toFalse
. So I would ensure itsTrue
by hardcoding itbase.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other parameters, I think the only one I would possibly provide access to is
alpha
. If the other HV variants are also included they need an alpha passed topartitioning
.To simplify thing we could also not make
alpha
configurable but set the value according to a heuristic, it seems it should be 0.0 for m<=5, then we could add a linear increase until m=20 or so.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then I think it's more elegant to just add it to our qLogNEHVI wrapper with default value
True
just like we did for the scalar version. That way, we have a consistent useful default while still being configurable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, botorch already has the following built-in logic (I guess this what you are referring to)? So if we want to use a smart non-static default, I'd rather go with just calling their internal logic instead of coding on ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely lets use this
so
alpha
becomes a property and not an attribute right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make any sense to support weights too? Or would it just warp the pareto front and not change anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting question 🤔 So my thoughts are: A pareto approach is inherently "neutral", i.e. there is no notion of one target being more/less important than another. This conflicts with the idea of attaching weights to objectives from the ground up. So I'd say:
no
. And "weighted" objective used in the code is just an instrument to achieve the sign flip.Nevertheless, I also wonder what would actually happen if we attached some weights... Thinking graphically, I think it would basically correspond to an axis-specific stretch of the dimensions of the target-space plot. And if that's the case, the hypervolume computation would be a bit "skewed". But in the limit when you increase the batch size, you'd still cover the same front, I guess. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some new insights, will share later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know whether the weights are applied after the single-target acqf values are calculated?
Because in that case:$t$ )
Lets look at two points described by an EI value for each target (here called
The hypervolume then is
If we assume the HVI for the first point is larger than for the second then this is true:
In any other coordinate system where the$t$ values are scaled by weights $(w_1, w_2)$ we have:
This is also true in case of
<
or=
. It would mean relative order of two arb points can never be changed by any weights, making them obsolete.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this face-to-face, I think will be easier