stanleyngugi

Stanley Ngugi stanleyngugi

ai research, mechanistic interpretability email: [email protected]

Popular repositories Loading

surgical_knowledge_editing surgical_knowledge_editing Public

This repository provides the official implementation of an "unlearn-then-learn" strategy that uses interpretability-driven circuit localization and the $(IA)^{3}$ PEFT method to achieve precise, su…

Python 2
mue_project mue_project Public

A PyTorch implementation of MUE, a minimalist framework that guides pre-trained diffusion models to autonomously explore novel, coherent outputs by leveraging local denoising instabilities as an un…

Python 1
tli_project tli_project Public

Python 1
taming_polysemanticity taming_polysemanticity Public

A PyTorch toy model for mechanistic interpretability (MI) exploring incidental polysemanticity. This project uses an MLP and sparse autoencoder (SAE) to systematically ablate how training artifacts…

Python