Figuring out why fit_gpytorch_mll_scipy fails #2670

Hrovatin · 2025-01-06T14:02:54Z

Hrovatin
Jan 6, 2025

Getting fitting info from fit_gpytorch_mll

I was trying to fit a model with LKJCovariancePrior on IndexKernel using fit_gpytorch_mll, however, I often get the below warnings:

[/Users/karinhrovatin/miniforge3/envs/baybe/lib/python3.10/site-packages/linear_operator/utils/cholesky.py:40](http://localhost:8888/Users/karinhrovatin/miniforge3/envs/baybe/lib/python3.10/site-packages/linear_operator/utils/cholesky.py#line=39): NumericalWarning: A not p.d., added jitter of 1.0e-05 to the diagonal
  warnings.warn(
[/Users/karinhrovatin/miniforge3/envs/baybe/lib/python3.10/site-packages/botorch/optim/fit.py:102](http://localhost:8888/Users/karinhrovatin/miniforge3/envs/baybe/lib/python3.10/site-packages/botorch/optim/fit.py#line=101): OptimizationWarning: `scipy_minimize` terminated with status 3, displaying original message from `scipy.optimize.minimize`: ABNORMAL_TERMINATION_IN_LNSRCH

I am trying to figure out why adding the prior harms the training, but getting any optimization information from fit_gpytorch_mll seems very convoluted. I was wondering if anyone could suggest how to inspect the training behaviour of fit_gpytorch_mll (where fit_gpytorch_mll_scipy is used by default).

The only solution I found so far was increasing max_attempts but this seems suboptimal as optimization needs to be restarted multiple times. Also it makes me wonder if my model/prior may be somehow misspecified.

More details on my specific setting

(Not really relevant for the above question, but just in case)

I have module:

SingleTaskGP(
  (likelihood): GaussianLikelihood(
    (noise_covar): HomoskedasticNoise(
      (noise_prior): GammaPrior()
      (raw_noise_constraint): GreaterThan(1.000E-04)
    )
  )
  (mean_module): ConstantMean()
  (covar_module): ProductKernel(
    (kernels): ModuleList(
      (0): ScaleKernel(
        (base_kernel): MaternKernel(
          (lengthscale_prior): GammaPrior()
          (raw_lengthscale_constraint): Positive()
        )
        (outputscale_prior): GammaPrior()
        (raw_outputscale_constraint): Positive()
      )
      (1): IndexKernel(
        (raw_var_constraint): Positive()
        (cov_prior): LKJCovariancePrior()
      )
    )
  )
  (outcome_transform): Standardize()
  (input_transform): Normalize()
)

I train this on two sigmoid functions (2 tasks for the index kernel) where the two functions are more/less correlated due to shift. See this example (with some noise added for the plot):

shift =2 is always used, plus one of the other lines.

I use prior with small eta (<1, e.g. 0.1 or 0.5) and this should bias the index covariance towards large off-diagonal values. Here is example of samples drawn from priors with different eta values. cov/var shows the off diagonal element divided by the diagonal element of the sample

However, fit_gpytorch_mll most often fails on highly (anti)correlated curves (secondary line shift = 6 or 10), which is unexpected given that my prior should favour index kernel having strong (anti)correlation between tasks. The above warnings are less common when I increase eta (e.g. to 1 or 5).

Balandat · 2025-01-06T15:06:34Z

Balandat
Jan 6, 2025
Collaborator

If you want to inspect what's happening in the scipy optimization, you can pass

optimizer_kwargs={"callback": your_callback}

to fit_gpytorch_mll(), where your_callback() is a function you can craft to inspect the OptimizeResult that are generated in intermediate steps during the call of scipy.optimize.minimize: https://github.com/pytorch/botorch/blob/main/botorch/optim/fit.py#L74-L75

As to why the fitting fails with the prior with small eta values - My guess would be that this makes the optimization landscape harder or potentially causes numerical issues or generally . The pdf of the LKJ prior is pdf(\Sigma) \sim |\Sigma| ^ (\eta - 1), so as eta -> 0 the prior encourages small determinant (i.e. almost singular) cross-task correlation matrices. This will cause some precision issues which can potentially translate to the optimizer hitting issues during line search as suggested by your result. As you set eta=1 the prior is uninformative.

What's your actual application for this? Typically, when doing some kind of multi-task learning one will generally try to use an LKJ prior with eta > 1 to avoid negative transfer and only learn a high correlation between tasks if there is clear evidence in the data.

1 reply

Hrovatin Jan 7, 2025
Author

Thank you, I will try the callback.

Hrovatin · 2025-01-07T07:25:37Z

Hrovatin
Jan 7, 2025
Author

To me it was suggested that it would have been better to use eta <1 as we believe that tasks that we add should hopefully be related in some way.

Currently, we are searching for a way to improve training when using botorch.fit.fit_gpytorch_mll instead of old botorch.optim.fit.fit_gpytorch_mll_torch(mll, step_limit=200) as our BO performance degrades when using the former in many use-cases. One idea was using priors, where they are still missing, to regularise the training.

4 replies

Hrovatin Jan 8, 2025
Author

Also, on the botorch MultiTaskGP the default for eta is 0.5

botorch/botorch/models/multitask.py

Line 371 in 7715ff4

eta = prior_config.get("eta", 0.5)

Balandat Jan 8, 2025
Collaborator

our BO performance degrades when using the former in many use-cases

So this is about BO performance degrading, rather than model performance degrading? Do you see any deterioration in model performance (as evaluated e.g. via cross validation) in your setting even though the model uses the same specification (and is just fit differently)? If so, this may be more of an issue with using an over-exploitative acquisition function (where fitting the model not to NLL convergence acts as a regularizer and encourages additional exploration).

Hrovatin Jan 8, 2025
Author

Thank you for the suggestion. I will look at this more.
What I already tried is using higher LKJCov eta which reduces correlation between tasks, making it more similar to the under-fitted model. However, this did not improve BO performance - so maybe more general regularization of the whole GP would be needed.

saitcakmak Jan 8, 2025
Collaborator

on the botorch MultiTaskGP the default for eta is 0.5

Note that this is only used if prior_config is not None:. We do not use a prior for the IndexKernel by default. In my limited experimentation with LKJ prior, I haven't found it to affect model fits in predictable ways. Other teammates I had talked to about it also said they had found it difficult to configure the prior.

Hrovatin · 2025-01-08T10:14:47Z

Hrovatin
Jan 8, 2025
Author

Posting the solution I used for logging:

I created a callback logger:

class OptimizationLogger:
    def __init__(self):
        self.history = []

    def __call__(self, parameters, result):
        self.history.append({
            "parameters": {k: deepcopy(v.numpy(force=True)) for k, v in parameters.items()},
            "result": result
        })

An instance of this logger is then passed to

botorch.fit.fit_gpytorch_mll(
            mll,
            optimizer_kwargs={'callback':optimizer_callback_instance}
        )

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figuring out why fit_gpytorch_mll_scipy fails #2670

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Figuring out why fit_gpytorch_mll_scipy fails #2670

Hrovatin Jan 6, 2025

Getting fitting info from fit_gpytorch_mll

More details on my specific setting

Replies: 3 comments · 5 replies

Balandat Jan 6, 2025 Collaborator

Hrovatin Jan 7, 2025 Author

Hrovatin Jan 7, 2025 Author

Hrovatin Jan 8, 2025 Author

Balandat Jan 8, 2025 Collaborator

Hrovatin Jan 8, 2025 Author

saitcakmak Jan 8, 2025 Collaborator

Hrovatin Jan 8, 2025 Author

Hrovatin
Jan 6, 2025

Replies: 3 comments 5 replies

Balandat
Jan 6, 2025
Collaborator

Hrovatin Jan 7, 2025
Author

Hrovatin
Jan 7, 2025
Author

Hrovatin Jan 8, 2025
Author

Balandat Jan 8, 2025
Collaborator

Hrovatin Jan 8, 2025
Author

saitcakmak Jan 8, 2025
Collaborator

Hrovatin
Jan 8, 2025
Author