Skip to content

Latest commit

 

History

History
6 lines (5 loc) · 474 Bytes

grad_results.md

File metadata and controls

6 lines (5 loc) · 474 Bytes

Averaging for monolithic method doesn't seem to work because it totally destroys the learning, and the ER is not big enough to relearn this.

  • Use fisher information (EWC/momentum encoder) to mitigate the averaging.
    • by chance, two networks offload two independent knowledge into the same weight part.
  • Use some sort of MAML / meta-learning although I actually don't quite understand the dynamic of meta learning. This seems to be a "inverse meta learning problem"???