Skip to content

Latest commit

 

History

History
executable file
·
18 lines (14 loc) · 341 Bytes

README.md

File metadata and controls

executable file
·
18 lines (14 loc) · 341 Bytes

Superficial Alignment

This repository contains code for the paper "Extracting and Understanding the Superficial Knowledge in Alignment (NAACL 2025)"

Step1: Extract token logits

bash scripts/extract_logit.sh

Step2: Train linear model

bash scripts/train_logit.sh

Step3: Run eval

bash scripts/run_eval.sh