You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, nice blog, thanks for sharing it!
Just wanted to warn you that your hf and wandb keys are still in the train colab you linked. Could you make the wandb report public ? It would be helpful to check the compute you needed.
Also, there is an image missing for this description
The second most frequent feature (feature index ...) in the Pythia 6.9B sparse autoencoder activates on the token "·the".
Sparse Autoencoders for a More Interpretable RLHF | Naomi Bashkansky
Extending Anthropic's recent monosemanticity results toward a new, more interpretable way to fine-tune.
https://naomibashkansky.com/blog/2023/sparse-autoencoders-for-interpretable-rlhf/
The text was updated successfully, but these errors were encountered: