This repository contains code for the paper "Extracting and Understanding the Superficial Knowledge in Alignment (NAACL 2025)"
bash scripts/extract_logit.sh
bash scripts/train_logit.sh
bash scripts/run_eval.sh
This repository contains code for the paper "Extracting and Understanding the Superficial Knowledge in Alignment (NAACL 2025)"
bash scripts/extract_logit.sh
bash scripts/train_logit.sh
bash scripts/run_eval.sh