mechanistic interpretability for neural interpretability
With pixi (recommended)
Prerequisites:
In the root directory, just run pixi install --manifest-path ./pyproject.toml
- this will create a conda env named 'mini'.
All package dependencies are specified in the 'pyproject.toml'. You can format them as required for your favorite python environment / package management tool, and install them using this tool (e.g. via pip, poetry, conda (directly instead of with pixi), etc.)
Given:
-
Neural data (in the form of binned spike counts as
) -
Behavioral and/or environmental metadata
mini performs the following steps to find interpretable neural signatures that underlie behavioral and/or environmental features (referred to collectively as natural features)
-
Splits the neural data into train/val/test sets
-
Trains an SAE (on the train + val splits) to reconstruct the neural data
-
Validates the quality of the SAE, by looking at
-
Sparsity of SAE features
-
Reconstruction quality of neural data
-
-
Ranks the SAE features by interpretability likelihood
-
For each of the top
SAE features, finds a corresponding natural feature with manual feedback. -
Validates the SAE-natural feature pairing on the test split, by:
-
Looking at confusion matrix metrics for co-occurrences of the natural feature with the SAE feature.
-
Showing that the neural signature defined by the SAE feature can decode the natural feature as well as it can be decoded by the entirety of the neural data.
-