Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Surrogates #338

Merged
merged 104 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
95db813
Remove no longer needed utilities
AdrianSosic Jun 20, 2024
3a6ca58
Move exp rep to comp rep transition point to surrogate
AdrianSosic Jun 20, 2024
023b5aa
Transform to dataframe instead of tensor
AdrianSosic Jun 21, 2024
376ffb1
Update CHANGELOG.md
AdrianSosic Jun 25, 2024
c0fb982
Make recommenders operate on experimental representation
AdrianSosic Jun 26, 2024
126bc77
Add missing type hints to transform attributes
AdrianSosic Jun 28, 2024
c8c465c
Add TODO for measurement validation
AdrianSosic Jun 28, 2024
78e3938
Add reference to docstring
AdrianSosic Jun 28, 2024
f1406ae
Comp Rep Transition Point (#278)
AdrianSosic Jul 3, 2024
602c646
Refactor signatures using model context
AdrianSosic Jul 8, 2024
e62d4e2
Add GaussianSurrogate base class
AdrianSosic Jul 9, 2024
e708fd5
Update constant target catching
AdrianSosic Jul 9, 2024
9140b07
Store constant target fallback models outside of surrogate instances
AdrianSosic Jul 9, 2024
d094a1a
Catch passing of unimplemented posterior options
AdrianSosic Jul 9, 2024
5f095de
Update CHANGELOG.md
AdrianSosic Jul 9, 2024
32e2ef5
Add docstrings to override methods in decorator
AdrianSosic Jul 11, 2024
956d1d4
Surrogate Posteriors (#309)
AdrianSosic Jul 15, 2024
d9aefe5
Remove current scaling functionality
AdrianSosic Jul 9, 2024
369da45
Make to_tensor also handle numpy arrays
AdrianSosic Jul 16, 2024
0ede1cc
Replace param_bounds_comp with comp_rep_bounds
AdrianSosic Jul 16, 2024
00c40ae
Draft input scaling mechanism
AdrianSosic Jul 17, 2024
79f8f44
Introduce ScalerProtocol class
AdrianSosic Jul 19, 2024
24f2c49
Make transformation return a dataframe
AdrianSosic Jul 19, 2024
2938c48
Update streamlit dev script
AdrianSosic Jul 19, 2024
ae1a366
Fix handling of dropped columns in ColumnTransformer
AdrianSosic Jul 19, 2024
5068148
Remove obsolete TODO note
AdrianSosic Jul 19, 2024
fb14927
Make surrogate scaling work with continuous parameters
AdrianSosic Jul 19, 2024
c3a4cc6
Rename _get_parameter_scaler to _make_parameter_scaler
AdrianSosic Jul 19, 2024
64b5450
Draft output scaling mechanism
AdrianSosic Jul 22, 2024
6dad04a
Silence warning by allowing extra columns
AdrianSosic Jul 22, 2024
25e356a
Improve signatures
AdrianSosic Jul 22, 2024
2a2849b
Harmonize terminology
AdrianSosic Jul 22, 2024
920b079
Update test for empty bounds
AdrianSosic Jul 22, 2024
cdf6688
Fix import order
AdrianSosic Jul 22, 2024
6e052f7
Decide for transformation approach
AdrianSosic Jul 22, 2024
ef84a35
Update docstrings
AdrianSosic Jul 22, 2024
2b3dcab
Remove separate scaling logic from GPs
AdrianSosic Jul 23, 2024
161bddb
Rename ScalerProtocol to ParameterScalerProtocol
AdrianSosic Jul 23, 2024
e7f3f67
Update CHANGELOG.md
AdrianSosic Jul 23, 2024
21953d4
Replace literal return type with None
AdrianSosic Jul 23, 2024
536a3a8
Implement workaround to circumvent ColumnTransformer limitations
AdrianSosic Jul 24, 2024
b88b3ba
Improve code grouping
AdrianSosic Jul 24, 2024
1619bd7
Remove register_custom_architecture decorator
AdrianSosic Jul 24, 2024
2f5fa21
Surrogate scaling (#315)
AdrianSosic Jul 24, 2024
05ed596
Introduce SurrogateProtocol class to enable custom architectures
AdrianSosic Jul 24, 2024
525aed3
Fix typo in method reference
AdrianSosic Jul 24, 2024
be51163
Refactor transformation steps in acquisition function translation
AdrianSosic Jul 24, 2024
a1eacc0
Fix remaining surrogate-external transformation calls
AdrianSosic Jul 24, 2024
53e4ade
Implement torch-based column transformer
AdrianSosic Jul 25, 2024
255d91b
Refactor scaling logic
AdrianSosic Jul 25, 2024
47e6819
Add missing transform flags
AdrianSosic Jul 26, 2024
ee6439c
Register de-/serialization hooks for SurrogateProtocol
AdrianSosic Jul 26, 2024
befd68f
Add details on the requirements imposed by the surrogate protocol
AdrianSosic Jul 26, 2024
90fd62d
Lazify validation of column transformer
AdrianSosic Jul 26, 2024
a84dd8e
Optimize input scaling logic using walrus
AdrianSosic Jul 31, 2024
c12a992
Rephrase ColumnTransformer docstring
AdrianSosic Jul 31, 2024
629f742
Avoid allow_missing=True by accessing discrete subspace
AdrianSosic Jul 31, 2024
179fa5a
Make target scaler method return a factory
AdrianSosic Jul 31, 2024
8168c01
Mention customization in scaler method docstrings
AdrianSosic Jul 31, 2024
b1ec8d3
Add validation to column index extraction method
AdrianSosic Jul 31, 2024
b83f39c
Expand docstring of get_comp_rep_parameter_indices
AdrianSosic Jul 31, 2024
aafe2f7
Add TODO note
AdrianSosic Jul 31, 2024
6dabe87
Explicitly handle train/eval mode in ColumnTransformer
AdrianSosic Jul 31, 2024
57ee8b3
Add docstring sections to posterior methods
AdrianSosic Jul 31, 2024
bcbcf3f
Account for potentially non-existing output scaler
AdrianSosic Aug 7, 2024
5e63cb7
Fix bug in GP posterior computation
AdrianSosic Jul 31, 2024
ab26cdd
Change plan and expose internal GP, but with disabled base class scaling
AdrianSosic Aug 7, 2024
98f13aa
Surrogate interface (#325)
AdrianSosic Aug 8, 2024
06a99b9
Activate mypy for surrogates
AdrianSosic Aug 7, 2024
84c0bd4
Ignore method overrides
AdrianSosic Aug 8, 2024
3a5909f
Fix types of returned objects
AdrianSosic Aug 8, 2024
8b4194b
Add ngboost to mypy ignores
AdrianSosic Aug 8, 2024
f9905f6
Simplify sklearn mypy ignores
AdrianSosic Aug 8, 2024
887bcef
Fix model context typing
AdrianSosic Aug 8, 2024
cbf2f5a
Fix signature of CustomONNXSurrogate._fit
AdrianSosic Aug 8, 2024
c13fa55
Add explicit return values
AdrianSosic Aug 8, 2024
74b7e58
Disable output scaling for tree-based surrogates
AdrianSosic Aug 8, 2024
944ac8e
Add sklearn_extra to mypy ignores
AdrianSosic Aug 8, 2024
1175cd9
Fix GP creation from preset
AdrianSosic Aug 8, 2024
caa72ae
Update path to Objective class
AdrianSosic Aug 8, 2024
7a0f9e6
Fix return type
AdrianSosic Aug 8, 2024
67ccfd1
Temporarily suppress mypy errors for batchify
AdrianSosic Aug 8, 2024
eaefde6
Raise error when attempting to access posterior before training
AdrianSosic Aug 8, 2024
1230978
Add typing workaround for accessing optional attributes
AdrianSosic Aug 8, 2024
08c3c2b
Clean up mypy.ini
AdrianSosic Aug 8, 2024
b1fd6f0
Fix surrogate docstrings
AdrianSosic Aug 8, 2024
76013ac
Update ignore list in conf.py
AdrianSosic Aug 9, 2024
b186360
Make _ModelContext public to avoid sphinx problems
AdrianSosic Aug 9, 2024
b11f6f3
Mypy for surrogates (#337)
AdrianSosic Aug 9, 2024
f73caee
Merge branch 'main' into dev/surrogates
AdrianSosic Aug 9, 2024
c00dee5
Fix mypy error
AdrianSosic Aug 9, 2024
c06cd46
Allow extra columns in public posterior call
AdrianSosic Aug 9, 2024
1db3a9c
Add missing blank line to example
AdrianSosic Aug 9, 2024
a7b3b4e
Update use of surrogate in examples
AdrianSosic Aug 9, 2024
a1d3715
Update CHANGELOG.md
AdrianSosic Aug 9, 2024
d122d69
Indicate in docstring that scalers are fitted
AdrianSosic Aug 13, 2024
1ad8d8f
Add backticks to docstring reference
AdrianSosic Aug 14, 2024
4b22df6
Remove context argument from _fit signature
AdrianSosic Aug 21, 2024
b74fbdf
Silence mypy error
AdrianSosic Aug 26, 2024
4f9e3cc
Replace generator comprehension with tuple in to_tensor
AdrianSosic Aug 28, 2024
3b1635d
Indicate candidates domain using suffixes
AdrianSosic Aug 28, 2024
8e688c1
Rename _posterior_comp_rep to _posterior_comp
AdrianSosic Aug 28, 2024
ed32ab8
Refine docstrings
AdrianSosic Aug 28, 2024
3768774
Merge branch 'main' into dev/surrogates
AdrianSosic Aug 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,40 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
AdrianSosic marked this conversation as resolved.
Show resolved Hide resolved
### Breaking Changes
- The public methods of `Surrogate` models now operate on dataframes in experimental
representation instead of tensors in computational representation
- `Surrogate.posterior` models now returns a `Posterior` object
- `param_bounds_comp` of `SearchSpace`, `SubspaceDiscrete` and `SubspaceContinuous` has
been replaced with `comp_rep_bounds`, which returns a dataframe

### Added
- `py.typed` file to enable the use of type checkers on the user side
- `GaussianSurrogate` base class for surrogate models with Gaussian posteriors
- `comp_rep_columns` property for `Parameter`, `SearchSpace`, `SubspaceDiscrete`
and `SubspaceContinuous` classes
- New mechanisms for surrogate input/output scaling configurable per class
- `SurrogateProtocol` as an interface for user-defined surrogate architectures

### Changed
- The transition from experimental to computational representation no longer happens
in the recommender but in the surrogate
- Fallback models created by `catch_constant_targets` are stored outside the surrogate
- `to_tensor` now also handles `numpy` arrays

### Fixed
- `CategoricalParameter` and `TaskParameter` no longer incorrectly coerce a single
string input to categories/tasks
- `farthest_point_sampling` no longer depends on the provided point order

### Removed
- `register_custom_architecture` decorator
- `Scalar` and `DefaultScaler` classes

### Deprecations
- The role of `register_custom_architecture` has been taken over by
`baybe.surrogates.base.SurrogateProtocol`

## [0.10.0] - 2024-08-02
### Breaking Changes
- Providing an explicit `batch_size` is now mandatory when asking for recommendations
Expand Down
2 changes: 2 additions & 0 deletions baybe/acquisition/acqfs.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,8 @@ def get_integration_points(self, searchspace: SearchSpace) -> pd.DataFrame:
ValueError: If the search space is purely continuous and
'sampling_n_points' was not provided.
"""
# TODO: Move the core logic to `SearchSpace` and ``Subspace*`` classes

sampled_parts: list[pd.DataFrame] = []
n_candidates: int | None = None

Expand Down
21 changes: 15 additions & 6 deletions baybe/acquisition/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,15 @@
import pandas as pd
from attrs import define

from baybe.searchspace import SearchSpace
from baybe.objectives.base import Objective
from baybe.searchspace.core import SearchSpace
from baybe.serialization.core import (
converter,
get_base_structure_hook,
unstructure_base,
)
from baybe.serialization.mixin import SerialMixin
from baybe.surrogates.base import Surrogate
from baybe.surrogates.base import SurrogateProtocol
from baybe.utils.basic import classproperty, match_attributes
from baybe.utils.boolean import is_abstract
from baybe.utils.dataframe import to_tensor
Expand All @@ -42,14 +43,22 @@ def _non_botorch_attrs(cls) -> tuple[str, ...]:

def to_botorch(
self,
surrogate: Surrogate,
surrogate: SurrogateProtocol,
searchspace: SearchSpace,
train_x: pd.DataFrame,
train_y: pd.DataFrame,
objective: Objective,
measurements: pd.DataFrame,
):
"""Create the botorch-ready representation of the function."""
"""Create the botorch-ready representation of the function.

The required structure of `measurements` is specified in
:meth:`baybe.recommenders.base.RecommenderProtocol.recommend`.
"""
import botorch.acquisition as botorch_acqf_module

# Get computational data representations
train_x = searchspace.transform(measurements, allow_extra=True)
train_y = objective.transform(measurements)

# Retrieve corresponding botorch class
acqf_cls = getattr(botorch_acqf_module, self.__class__.__name__)

Expand Down
4 changes: 4 additions & 0 deletions baybe/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,5 +61,9 @@ class UnidentifiedSubclassError(Exception):
"""A specified subclass cannot be found in the given class hierarchy."""


class ModelNotTrainedError(Exception):
"""A prediction/transformation is attempted before the model has been trained."""


class UnmatchedAttributeError(Exception):
"""An attribute cannot be matched against a certain callable signature."""
19 changes: 15 additions & 4 deletions baybe/parameters/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,6 @@ def is_in_range(self, item: Any) -> bool:
``True`` if the item is within the parameter range, ``False`` otherwise.
"""

@abstractmethod
def summary(self) -> dict:
"""Return a custom summarization of the parameter."""

def __str__(self) -> str:
return str(self.summary())

Expand All @@ -72,12 +68,21 @@ def is_discrete(self) -> bool:
"""Boolean indicating if this is a discrete parameter."""
return isinstance(self, DiscreteParameter)

@property
@abstractmethod
def comp_rep_columns(self) -> tuple[str, ...]:
"""The columns spanning the computational representation."""

def to_searchspace(self) -> SearchSpace:
"""Create a one-dimensional search space from the parameter."""
from baybe.searchspace.core import SearchSpace

return SearchSpace.from_parameter(self)

@abstractmethod
def summary(self) -> dict:
"""Return a custom summarization of the parameter."""


@define(frozen=True, slots=False)
class DiscreteParameter(Parameter, ABC):
Expand All @@ -97,8 +102,14 @@ def values(self) -> tuple:
@cached_property
@abstractmethod
def comp_df(self) -> pd.DataFrame:
# TODO: Should be renamed to `comp_rep`
"""Return the computational representation of the parameter."""

@property
def comp_rep_columns(self) -> tuple[str, ...]: # noqa: D102
# See base class.
return tuple(self.comp_df.columns)

def to_subspace(self) -> SubspaceDiscrete:
"""Create a one-dimensional search space from the parameter."""
from baybe.searchspace.discrete import SubspaceDiscrete
Expand Down
5 changes: 5 additions & 0 deletions baybe/parameters/numerical.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,11 @@ def is_in_range(self, item: float) -> bool: # noqa: D102

return self.bounds.contains(item)

@property
def comp_rep_columns(self) -> tuple[str, ...]: # noqa: D102
# See base class.
return (self.name,)

def summary(self) -> dict: # noqa: D102
# See base class.
param_dict = dict(
Expand Down
4 changes: 2 additions & 2 deletions baybe/recommenders/naive.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ def recommend( # noqa: D102
# Get discrete candidates. The metadata flags are ignored since the search space
# is hybrid
# TODO Slight BOILERPLATE CODE, see recommender.py, ll. 47+
_, candidates_comp = searchspace.discrete.get_candidates(
candidates_exp, _ = searchspace.discrete.get_candidates(
allow_repeated_recommendations=True,
allow_recommending_already_measured=True,
)
Expand All @@ -147,7 +147,7 @@ def recommend( # noqa: D102
# Call the private function of the discrete recommender and get the indices
disc_rec_idx = self.disc_recommender._recommend_discrete(
subspace_discrete=searchspace.discrete,
candidates_comp=candidates_comp,
candidates_exp=candidates_exp,
batch_size=batch_size,
)

Expand Down
22 changes: 11 additions & 11 deletions baybe/recommenders/pure/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,15 @@ def recommend( # noqa: D102
def _recommend_discrete(
self,
subspace_discrete: SubspaceDiscrete,
candidates_comp: pd.DataFrame,
candidates_exp: pd.DataFrame,
batch_size: int,
) -> pd.Index:
"""Generate recommendations from a discrete search space.

Args:
subspace_discrete: The discrete subspace from which to generate
recommendations.
candidates_comp: The computational representation of all discrete candidate
candidates_exp: The experimental representation of all discrete candidate
points to be considered.
batch_size: The size of the recommendation batch.

Expand All @@ -67,14 +67,14 @@ def _recommend_discrete(

Returns:
The dataframe indices of the recommended points in the provided
computational representation.
experimental representation.
"""
# If this method is not implemented by a child class, try to resort to hybrid
# recommendation (with an empty subspace) instead.
try:
return self._recommend_hybrid(
searchspace=SearchSpace(discrete=subspace_discrete),
candidates_comp=candidates_comp,
candidates_exp=candidates_exp,
batch_size=batch_size,
).index
except NotImplementedError as exc:
Expand Down Expand Up @@ -110,7 +110,7 @@ def _recommend_continuous(
try:
return self._recommend_hybrid(
searchspace=SearchSpace(continuous=subspace_continuous),
candidates_comp=pd.DataFrame(),
candidates_exp=pd.DataFrame(),
batch_size=batch_size,
)
except NotImplementedError as exc:
Expand All @@ -126,7 +126,7 @@ def _recommend_continuous(
def _recommend_hybrid(
self,
searchspace: SearchSpace,
candidates_comp: pd.DataFrame,
candidates_exp: pd.DataFrame,
batch_size: int,
) -> pd.DataFrame:
"""Generate recommendations from a hybrid search space.
Expand All @@ -138,7 +138,7 @@ def _recommend_hybrid(
Args:
searchspace: The hybrid search space from which to generate
recommendations.
candidates_comp: The computational representation of all discrete candidate
candidates_exp: The experimental representation of all discrete candidate
points to be considered.
batch_size: The size of the recommendation batch.

Expand Down Expand Up @@ -175,7 +175,7 @@ def _recommend_with_discrete_parts(

# Get discrete candidates
# Repeated recommendations are always allowed for hybrid spaces
_, candidates_comp = searchspace.discrete.get_candidates(
candidates_exp, _ = searchspace.discrete.get_candidates(
allow_repeated_recommendations=is_hybrid_space
or self.allow_repeated_recommendations,
allow_recommending_already_measured=is_hybrid_space
Expand All @@ -184,7 +184,7 @@ def _recommend_with_discrete_parts(

# Check if enough candidates are left
# TODO [15917]: This check is not perfectly correct.
if (not is_hybrid_space) and (len(candidates_comp) < batch_size):
if (not is_hybrid_space) and (len(candidates_exp) < batch_size):
raise NotEnoughPointsLeftError(
f"Using the current settings, there are fewer than {batch_size} "
"possible data points left to recommend. This can be "
Expand All @@ -196,11 +196,11 @@ def _recommend_with_discrete_parts(

# Get recommendations
if is_hybrid_space:
rec = self._recommend_hybrid(searchspace, candidates_comp, batch_size)
rec = self._recommend_hybrid(searchspace, candidates_exp, batch_size)
idxs = rec.index
else:
idxs = self._recommend_discrete(
searchspace.discrete, candidates_comp, batch_size
searchspace.discrete, candidates_exp, batch_size
)
rec = searchspace.discrete.exp_rep.loc[idxs, :]

Expand Down
14 changes: 4 additions & 10 deletions baybe/recommenders/pure/bayesian/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,14 @@
from baybe.recommenders.pure.base import PureRecommender
from baybe.searchspace import SearchSpace
from baybe.surrogates import CustomONNXSurrogate, GaussianProcessSurrogate
from baybe.surrogates.base import Surrogate
from baybe.utils.dataframe import to_tensor
from baybe.surrogates.base import SurrogateProtocol


@define
class BayesianRecommender(PureRecommender, ABC):
"""An abstract class for Bayesian Recommenders."""

surrogate_model: Surrogate = field(factory=GaussianProcessSurrogate)
surrogate_model: SurrogateProtocol = field(factory=GaussianProcessSurrogate)
"""The used surrogate model."""

acquisition_function: AcquisitionFunction = field(
Expand Down Expand Up @@ -51,14 +50,9 @@ def _setup_botorch_acqf(
measurements: pd.DataFrame,
) -> None:
"""Create the acquisition function for the current training data.""" # noqa: E501
# TODO: Transition point from dataframe to tensor needs to be refactored.
# Currently, surrogate models operate with tensors, while acquisition
# functions with dataframes.
train_x = searchspace.transform(measurements, allow_extra=True)
train_y = objective.transform(measurements)
self.surrogate_model._fit(searchspace, *to_tensor(train_x, train_y))
self.surrogate_model.fit(searchspace, objective, measurements)
self._botorch_acqf = self.acquisition_function.to_botorch(
self.surrogate_model, searchspace, train_x, train_y
self.surrogate_model, searchspace, objective, measurements
)

def recommend( # noqa: D102
Expand Down
Loading
Loading