Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Surrogates #338

Merged
merged 104 commits into from
Aug 29, 2024
Merged

Refactor Surrogates #338

merged 104 commits into from
Aug 29, 2024

Conversation

AdrianSosic
Copy link
Collaborator

@AdrianSosic AdrianSosic commented Aug 9, 2024

Completes the surrogate factoring, which extended over #278, #309, #315, #325, #337.

Most important changes

  • The transition point from experimental to computational representation has been moved from the recommender to the surrogate. From an architecture/responsibility perspective, this is reasonable since the recommend should not have to bother about algorithmic/computational details.
  • The desired consequence is that public Surrogate methods like posterior and fit can now operate on dataframes in experimental representation, meaning they can also be exposed directly to the user.
  • The new posterior methods now all return a general Posterior object instead of implicitly assuming Gaussian distributions. This paves the way for arbitrary surrogate extensions, such as Bernoulli/Categorical surrogates, etc. At the moment, this introduces an explicit coupling to botorch, which is fine because botorch remains a core dependency and the only backend used for complex surrogate modeling. In the future, this can be further abstracted by introducing our own Posterior class.
  • The Surrogate layout has been refined such that the extracted SurrogateProtocol, which now defines the formal interface for all surrogates, imposes minimal requirements to the user.
  • Scaling has been completely redesigned, offering the possibility to configure input/output scaling down to the level of individual parameters and targets. The configuration is currently class-specific, but can be extended to allow surrogate instance specific rules in the future.

This PR is a first step toward a refactored `Surrogate` layout:
* It moves the `exp_rep`-to-`comp_rep` transition point into the
surrogates, which now become responsible for handling the transformation
and can do it in whatever way they need it (which will also simplify
scaling later on).
* This cleans up the interface because users / calling classes can now
pass data in its canonical form (i.e., `exp_rep`) and do not need to
worry about transformations.
* This also means we can easily expose the trained surrogates to users
for model inspection (e.g., feature importance).
This PR is a next step toward a lean surrogate class layout:
* `Surrogate.(_)posterior` now returns a `Posterior` object
* `Surrogate._fit` no longer expects the `SearchSpace` as an argument,
which brings us closer to the state that `.fit` and `.posterior` operate
on the user/dataframe/context-level while `_posterior` and `_fit`
operate on the purely mathematical level. This means that a user who
writes their own surrogate class effectively only needs to implement the
corresponding mathematical model in the latter two methods. Optional
context information that may be required for this implementation (like
the dimension index of the `TaskParameter` in the passed `Tensor`
object) is now encapsulated into a surrogate-specific `context` object,
that can be arbitrarily populated by the surrogate class, but whose
logic is now cleanly separated from the actual fitting logic.
* Adds a new `GaussianSurrogate` base class for (most our other) models
that come with the implicit Gaussian noise assumption and effectively
only implement mean and (co-)variance estimation.
* Improves and simplifies logic of the `catch_constant_target`,
re-enabling slots for `Surrogates`
Preparation for use with sklearn's ColumnTransformer, which spits out arrays
@AdrianSosic AdrianSosic added this to the Surrogate refactoring milestone Aug 9, 2024
@AdrianSosic AdrianSosic changed the title Dev/surrogates Refactor Surrogates Aug 9, 2024
@AdrianSosic AdrianSosic marked this pull request as ready for review August 9, 2024 11:52
@AdrianSosic
Copy link
Collaborator Author

@AVHopp, @Scienfitz Finally, the epic is completed 😎 Last gate before merging into main

@AdrianSosic AdrianSosic added the new feature New functionality label Aug 9, 2024
Copy link
Collaborator

@AVHopp AVHopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly minor things. Thanks for this epic piece of work 🏆

baybe/surrogates/base.py Show resolved Hide resolved
baybe/surrogates/base.py Show resolved Hide resolved
baybe/surrogates/base.py Show resolved Hide resolved
baybe/surrogates/base.py Show resolved Hide resolved
baybe/surrogates/base.py Show resolved Hide resolved
baybe/surrogates/base.py Show resolved Hide resolved
baybe/surrogates/base.py Show resolved Hide resolved
baybe/surrogates/utils.py Show resolved Hide resolved
baybe/utils/scaling.py Outdated Show resolved Hide resolved
baybe/surrogates/base.py Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
Since the searchspace needs to be stored during fitting anyway (because it is needed by the posterior method), we can simply use a regular attribute access and do not need to pass it via the context method argument
@AVHopp AVHopp mentioned this pull request Aug 27, 2024
baybe/recommenders/pure/bayesian/botorch.py Show resolved Hide resolved
baybe/searchspace/core.py Show resolved Hide resolved
baybe/utils/dataframe.py Outdated Show resolved Hide resolved
baybe/surrogates/base.py Show resolved Hide resolved
baybe/surrogates/base.py Outdated Show resolved Hide resolved
baybe/surrogates/base.py Outdated Show resolved Hide resolved
baybe/surrogates/base.py Show resolved Hide resolved
@AdrianSosic AdrianSosic mentioned this pull request Aug 28, 2024
@AdrianSosic AdrianSosic merged commit e0adbf8 into main Aug 29, 2024
17 of 19 checks passed
@AdrianSosic AdrianSosic deleted the dev/surrogates branch August 29, 2024 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Expand / change existing functionality new feature New functionality refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants