Skip to content

UFO-101/auto-circuit

Repository files navigation

PyPI - Version GitHub Release

AutoCircuit

A library for efficient patching and automatic circuit discovery.

Static Badge

Read the paper

Transformer Circuit Metrics are not Robust (Oral spotlight, COLM 2024)

Getting Started

pip install auto-circuit

Easy and Efficient Edge Patching

patch_edges = [
"Resid Start->MLP 2",
"MLP 2->A2.4.Q",
"A2.4->Resid End",
]
with patch_mode(model, ablations, patch_edges):
patched_out = model(tokens)

Different Ablation Methods

ablations = src_ablations(model, test_loader, AblationType.TOKENWISE_MEAN_CORRUPT)

Automatic Circuit Discovery

attrution_patching_scores: PruneScores = mask_gradient_prune_scores(
model=model,
dataloader=test_loader,
official_edges=None,
grad_function="logit",
answer_function="avg_diff",
)

Visualization

fig = draw_seq_graph(model, prune_scores)

Cite this repo

@inproceedings{
  miller2024transformer,
  title={Transformer Circuit Evaluation Metrics Are Not Robust},
  author={Joseph Miller and Bilal Chughtai and William Saunders},
  booktitle={First Conference on Language Modeling},
  year={2024},
  url={https://openreview.net/forum?id=zSf8PJyQb2}
}

About

A library for efficient patching and automatic circuit discovery.

Resources

Stars

Watchers

Forks

Packages

No packages published