Skip to content

maxtli/plausibleablation

Repository files navigation

# # perform inference with batched toxic samples 
# # perform inference with untoxic samples
# # perform inference with ablated untoxic samples
# # take specific untoxic examples from the finetuned model, and perform inference
# # do this 144x, once for each attention head. do i need to save the indices? (also, ???)
# # (i guess this is just activation patching)

# # do some arithmetic on the output logits
# # check the ablated loss on the toxic samples

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages