Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Use on_EWC/ DER++ to Handle Regression Tasks #48

Open
JieDengsc opened this issue Aug 28, 2024 · 3 comments
Open

How to Use on_EWC/ DER++ to Handle Regression Tasks #48

JieDengsc opened this issue Aug 28, 2024 · 3 comments

Comments

@JieDengsc
Copy link

Specifically, I don't know how to modify the code. Because my task now is regression.

@loribonna
Copy link
Collaborator

Hi @JieDengsc , Mammoth is not really designed to handle regression but depending on your use case it may be easy to adapt.

Since the task is regression I guess you probably want to define a domain-il task, since without labels you wouldn't know how to split into separate tasks. Taking as an example the "perm-mnist" dataset (in datasets/perm_mnist.py) you could create a new file datasets/<your_dataset>.py and in it define a class that inherits from ContinualDataset and defines:

  • NAME: the name of your dataset
  • SETTING: this should just be set to domain-il
  • N_CLASSES_PER_TASK: set it to 1, it will make your backbone output just one logit
  • SIZE: the size of your images, as a tuple
  • get_data_loaders: create and return the full datasets. These should return a tuple (sample, label, not_aug_sample) for the train and (sample, label) for the test. The not_aug_sample is fundamental for methods that use a buffer (such as DER++). The get_data_loaders function will be called once every task, so following the perm-mnist you could handle different tasks here by defining a different transform per task. Make sure you call the store_masked_loaders at the end.
  • get_backbone: return the backbone architecture that will be optimized for you task
  • get_loss: return the F.mse_loss instead of the CE found in other datasets

Besides this, you will need to modify

  • the evaluate function in utils/training.py since now it only supports the accuracy.
  • lines 183/189 of utils/training.py to avoid casting the labels as long (this was done to prevent errors in windows).

If you don't want a "domain-il" setting and want to split data according to some other policy, I still suggest to define a "domain-il" dataset and splitting the data in the get_data_loaders.

We plan in the future to introduce some regression tasks. Let me know if yours is publicly available so that we may take a look into it.

@JieDengsc
Copy link
Author

Hi @loribonna , Thanks for your reply and suggestion, I will try it.

In addition, for ewc_on.py, when calculating the fish matrix, why do you add exp_cond_prob in the code?
fish += exp_cond_prob * self.net.get_grads() ** 2
According to the paper, only need to sum the squares of the gradients and take the average at the end.

Please let me know if I've misunderstood anything. Thanks a lot.

@loribonna
Copy link
Collaborator

loribonna commented Aug 29, 2024

The question is a bit of a rabbit hole and I'm not an expert on this but the reason is because the Fisher information matrix is computed as the expectation over the model's prediction of the gradients squared, so you need to multiply them by p(y|x), which is the why we take the exp of the loss.

I suggest you check out this paper and this discussion for more info.

Edit: in your regression scenario while you could use the same code from EwC I don't think the math would check out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@loribonna @JieDengsc and others