Multi-Task & Multi-Fidelity modeling #2388

ToennisStef · 2024-06-22T18:33:35Z

ToennisStef
Jun 22, 2024

Hi i am

Currently trying to do a multi-fidelity optimization. I was trying it with the SingleTaskMultiFidelityGP and now i am trying it with the MultiTaskGP.
The problem setup is actually this:

Observations at different discrete fidelity levels are available but new candidates form the BO should only be inferred from the highest fidelity.

I am using both the SingleTaskMultiFidelityGP and MultiTaskGP to have a "knowledge transfer" between the data fidelities.
For the setup of the BO with the SingleTaskMultiFidelityGP was following the discrete_multi_fidelity_bo tutorial. The whole setup seems kind of over the top with the definition of a cost model and so on. I adjusted it to only inferre new candidates in each BO iteration for the highest fidelity.

Here a quick code snipet form the adjusted code:


NUM_RESTARTS = 5 if not SMOKE_TEST else 2
RAW_SAMPLES = 128 if not SMOKE_TEST else 4
BATCH_SIZE = 1

def get_mfkg(model, Testfunction):

    cost_model = AffineFidelityCostModel(fidelity_weights={Testfunction.dim: -1.0}, fixed_cost=0.0)
    cost_aware_utility = InverseCostWeightedUtility(cost_model=cost_model)

    def project(X):
        return project_to_target_fidelity(X=X) 
    
    curr_val_acqf = FixedFeatureAcquisitionFunction(
        acq_function=PosteriorMean(model),
        d=Testfunction.dim+1,
        columns=[Testfunction.dim],
        values=[1],
    )

    _, current_value = optimize_acqf(
        acq_function=curr_val_acqf,
        bounds=Testfunction.bounds,
        q=1,
        num_restarts=10 if not SMOKE_TEST else 2,
        raw_samples=1024 if not SMOKE_TEST else 4,
        options={"batch_limit": 10, "maxiter": 200},
    )

    return qMultiFidelityKnowledgeGradient(
        model=model,
        num_fantasies=128 if not SMOKE_TEST else 2,
        current_value=current_value,
        cost_aware_utility=cost_aware_utility,
        project=project,
    )

def optimize_mfkg_and_get_observation(mfkg_acqf, Testfunction=None):
    """Optimizes MFKG and returns a new candidate, observation, and cost."""
    
    # generate new candidates
    candidates, _ = optimize_acqf_mixed(
        acq_function=mfkg_acqf,
        bounds=Testfunction.combined_bounds,
        fixed_features_list=[{Testfunction.dim: Testfunction.fidelities[0].item()}],
        q=BATCH_SIZE,
        num_restarts=NUM_RESTARTS,
        raw_samples=RAW_SAMPLES,
        options={"batch_limit": 1, "maxiter": 200},
    )

    new_x = candidates.detach()
    new_obj = Testfunction.neg(new_x[:,0:Testfunction.dim],new_x[:,Testfunction.dim:])
    return new_x, new_obj

Assuming that the Testfunction has the correct attributes.

For the MulitTaskGP I defined a GenericMCObjective that selects only the output of the highest fidelity.
For that problem definition, what model is the better choice?
To be honest I don't 100% get the DownsamplingKernel and the ExponentialDecayKernel from the original paper of the SingleTaskMultiFidelityGP. What exactly are the iteration fidelity parameters?
Is the SingleTaskMultiFidelityGP despite the rather complex set-up better for this problem setup than the MultiTaskGP because of the kernel structure?
I was also considering doing a polynomial or any other function fit of the low-fidelity data and then to introduce it to a SingleTaskGP as Prior Mean function. Might this be better than the other modeling choices?

I know these are more general questions, especially the last one and it is dependent on the data and dimensionality of the problem and so on but i am thankful for any help or recommendations.
Best regards,
Stefan Tönnis

ToennisStef · 2024-06-24T13:13:33Z

ToennisStef
Jun 24, 2024
Author

Ok while writing this I realized that no cost-aware acquisition function for the described problem is necessary. I can use a FixedFeatureAcquisition function to fix the fidelity_feature column to the value of the highest fidelity and then use any acquisition function.

But the question still prevails: should you prefer any model over the other in this setup?
I realized that the confidence_region of the MultiTaskGP is rather small and the model rather overfits to the data compared to the SingleTaskMultiFidelityGP.

0 replies

esantorella · 2024-06-26T15:42:31Z

esantorella
Jun 26, 2024
Collaborator

A multi-fidelity model should work better when the fidelities are ordered, so you know that the lower-fidelity observations are worse than the higher-fidelity observations, not just different. But since you've fit both, you can look at the data and not rely on theory, by checking which has better cross-validation performance. Ideally, you'd do that on the target fidelity, if there is enough data on it.

2 replies

ToennisStef Jun 26, 2024
Author

Thank you very much for the quick response!

Would the Laplace approximation of the model evidence or the Bayesian Information Criterion also be a good indicator of the model performance?

esantorella Jun 26, 2024
Collaborator

So you could use the BIC, which is a Laplace approximation to the model evidence, but you could also use the model evidence directly. "Bayesian Model Selection, the Marginal Likelihood, and Generalization" discusses the marginal likelihood and contrasts it with other methods, and it has a few comments on GP regression specifically. The marginal (log) likelihood is certainly cheaper than cross-validation, and in fact will already have been computed during model-fitting.

Computing the model evidence -- equivalently, the marginal likelihood -- would look something like

from botorch.fit import fit_gpytorch_mll, get_loss_closure_with_grads
from gpytorch.mlls import ExactMarginalLogLikelihood

model = ModelCls(train_X=x, train_Y=y, ...)
# fit the model
mll = ExactMarginalLogLikelihood(model)
fit_gpytorch_mll(mll)
# compute the marginal log likelihood at the training data
closure = get_loss_closure_with_grads(
    mll, parameters=get_parameters(mll, requires_grad=True)
)
print(closure()[0])

That example uses the full data, but you would still face a choice about doing model selection on the full data vs. only on the full-fidelity observations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Task & Multi-Fidelity modeling #2388

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Multi-Task & Multi-Fidelity modeling #2388

ToennisStef Jun 22, 2024

Replies: 2 comments · 2 replies

ToennisStef Jun 24, 2024 Author

esantorella Jun 26, 2024 Collaborator

ToennisStef Jun 26, 2024 Author

esantorella Jun 26, 2024 Collaborator

ToennisStef
Jun 22, 2024

Replies: 2 comments 2 replies

ToennisStef
Jun 24, 2024
Author

esantorella
Jun 26, 2024
Collaborator

ToennisStef Jun 26, 2024
Author

esantorella Jun 26, 2024
Collaborator