Replies: 3 comments 5 replies
-
If you want to inspect what's happening in the scipy optimization, you can pass
to As to why the fitting fails with the prior with small What's your actual application for this? Typically, when doing some kind of multi-task learning one will generally try to use an LKJ prior with |
Beta Was this translation helpful? Give feedback.
-
To me it was suggested that it would have been better to use eta <1 as we believe that tasks that we add should hopefully be related in some way. Currently, we are searching for a way to improve training when using |
Beta Was this translation helpful? Give feedback.
-
Posting the solution I used for logging: I created a callback logger:
An instance of this logger is then passed to
|
Beta Was this translation helpful? Give feedback.
-
Getting fitting info from fit_gpytorch_mll
I was trying to fit a model with LKJCovariancePrior on IndexKernel using
fit_gpytorch_mll
, however, I often get the below warnings:I am trying to figure out why adding the prior harms the training, but getting any optimization information from
fit_gpytorch_mll
seems very convoluted. I was wondering if anyone could suggest how to inspect the training behaviour offit_gpytorch_mll
(wherefit_gpytorch_mll_scipy
is used by default).The only solution I found so far was increasing
max_attempts
but this seems suboptimal as optimization needs to be restarted multiple times. Also it makes me wonder if my model/prior may be somehow misspecified.More details on my specific setting
(Not really relevant for the above question, but just in case)
I have module:
I train this on two sigmoid functions (2 tasks for the index kernel) where the two functions are more/less correlated due to shift. See this example (with some noise added for the plot):
shift =2 is always used, plus one of the other lines.
I use prior with small eta (<1, e.g. 0.1 or 0.5) and this should bias the index covariance towards large off-diagonal values. Here is example of samples drawn from priors with different eta values. cov/var shows the off diagonal element divided by the diagonal element of the sample
However,
fit_gpytorch_mll
most often fails on highly (anti)correlated curves (secondary line shift = 6 or 10), which is unexpected given that my prior should favour index kernel having strong (anti)correlation between tasks. The above warnings are less common when I increase eta (e.g. to 1 or 5).Beta Was this translation helpful? Give feedback.
All reactions