Jacobian computation fails for last layer laplace #265

magnusross · 2024-12-05T13:53:28Z

I am having some issues running last layer laplace for a certain type of model. I have included a minimal example below:

import torch
from laplace import Laplace
from torch.utils.data import DataLoader, TensorDataset

import torch.nn as nn

# Define a new model for 2D input
class SingleLinearLayer2D(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(SingleLinearLayer2D, self).__init__()
        # self.layers = nn.ModuleList([nn.Linear(input_dim, output_dim), nn.Linear(output_dim, output_dim)])
        self.layers = nn.ModuleList([nn.Linear(input_dim, output_dim)])
        

    def forward(self, x):
        # Flatten the 2D input to 1D
        for layer in self.layers:
            x = layer(x)
        return x.flatten(1,2)
    
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device}")
# Generate some 2D data
x = torch.randn(100, 10, 10, device=device)
y = torch.randn(100, 50, device=device)

# Create a DataLoader
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Create the new model
model = SingleLinearLayer2D(input_dim=10, output_dim=5)
model = model.to(device)

la = Laplace(model, 'regression', hessian_structure='kron', subset_of_weights='last_layer')
la.fit(dataloader)
try:
    la(x, pred_type="glm", n_samples=10, link_approx='mc')
except RuntimeError as e:
    print(f"Predict failed for kron: {e}")

la = Laplace(model, 'regression', hessian_structure='diag', subset_of_weights='last_layer')
try:
    la.fit(dataloader)
except RuntimeError as e:
    print(f"Fit failed for diag: {e}")

The error is always something like this

  File "/Users/magnus/.conda/envs/laplace-tsf/lib/python3.12/site-packages/laplace/baselaplace.py", line 850, in fit
    loss_batch, H_batch = self._curv_closure(X, y, N=N)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/magnus/.conda/envs/laplace-tsf/lib/python3.12/site-packages/laplace/baselaplace.py", line 1857, in _curv_closure
    return self.backend.diag(X, y, N=N, **self._asdl_fisher_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/magnus/.conda/envs/laplace-tsf/lib/python3.12/site-packages/laplace/curvature/curvature.py", line 417, in diag
    Js, f = self.last_layer_jacobians(x) if self.last_layer else self.jacobians(x)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/magnus/.conda/envs/laplace-tsf/lib/python3.12/site-packages/laplace/curvature/curvature.py", line 162, in last_layer_jacobians
    Js = torch.einsum("kp,kij->kijp", phi, identity).reshape(bsize, output_size, -1)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/magnus/.conda/envs/laplace-tsf/lib/python3.12/site-packages/torch/functional.py", line 402, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: einsum(): the number of subscripts in the equation (2) does not match the number of dimensions (3) for operand 0 and no ellipsis was given

You can see that for the diag approximation, the fit fails, and for the kron approximation, the fit works but the glm prediction fails. The issue is the same on cuda or cpu. This same code works fine if subset_of_weights="all". I am not exactly sure what model architectures cause this issue, but it seems to fail when the input to the model has more than one dimension, but the output has a single dimension. It also fails for model with multiple layers, as you can see from the commented code.

Sorry I feel I haven't explained the issue super clearly, but hopefully the example gives enough information. Plase let me know if I can provide anything else!

Thanks :)

The text was updated successfully, but these errors were encountered:

magnusross · 2024-12-11T16:01:05Z

Note I have also tried different backends to CurvlinopsEF (e.g. AsdlGGN) and it doesn't fix the problem

wiseodd · 2024-12-12T21:20:30Z

Can you use this instead? #254

See also docs: https://aleximmer.github.io/Laplace/huggingface_example/#laplace-on-a-subset-of-an-llms-weights

magnusross · 2024-12-17T09:39:03Z

Thank, I'll try that. From reading some of the docs and other issues it seems that if you use the "switching off gradients" approach, you lose some performance, is that the case? In particular, would like to make use of the fast variance for predictions, since my use case has many (100s-1000s) of outputs, is that possible without using LLLaplace?

wiseodd mentioned this issue Dec 12, 2024

Should LLLaplace be deprecated? #254

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jacobian computation fails for last layer laplace #265

Jacobian computation fails for last layer laplace #265

magnusross commented Dec 5, 2024

magnusross commented Dec 11, 2024

wiseodd commented Dec 12, 2024

magnusross commented Dec 17, 2024

Jacobian computation fails for last layer laplace #265

Jacobian computation fails for last layer laplace #265

Comments

magnusross commented Dec 5, 2024

magnusross commented Dec 11, 2024

wiseodd commented Dec 12, 2024

magnusross commented Dec 17, 2024