Skip to content

Commit

Permalink
Merge pull request #10 from lauritowal/patch-2
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
AlexTMallen authored Sep 24, 2024
2 parents d4345a3 + 217926c commit c737428
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
Because language models are trained to predict the next token in naturally occurring text, they often reproduce common
human errors and misconceptions, even when they "know better" in some sense. More worryingly, when models are trained to
generate text that's rated highly by humans, they may learn to output false statements that human evaluators can't
detect. We aim to circumvent this issue by directly [eliciting latent knowledge
](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit) (ELK) inside the activations
detect. We aim to circumvent this issue by directly [eliciting latent knowledge](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit) (ELK) inside the activations
of a language model.

Specifically, we're building on the **Contrastive Representation Clustering** (CRC) method described in the
Expand Down

0 comments on commit c737428

Please sign in to comment.