Skip to content

Commit

Permalink
Merge pull request #9 from lauritowal/joss-paper
Browse files Browse the repository at this point in the history
Joss paper
  • Loading branch information
AlexTMallen authored Sep 2, 2024
2 parents b350d08 + 129902a commit 8e72628
Show file tree
Hide file tree
Showing 367 changed files with 828 additions and 410 deletions.
4 changes: 0 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
*.csv
*.npy
elk/models/*
elk/trained/*
nohup.out
.idea
*.pkl

# scripts for experiments in progress
my_*.sh
Expand Down
2 changes: 1 addition & 1 deletion LICENSE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2023 EleutherAI
Copyright (c) 2024 EleutherAI

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
4 changes: 2 additions & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
recursive-include elk/promptsource/templates *
recursive-include elk/resources *
recursive-include ccs/promptsource/templates *
recursive-include ccs/resources *
91 changes: 59 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
Because language models are trained to predict the next token in naturally occurring text, they often reproduce common
human errors and misconceptions, even when they "know better" in some sense. More worryingly, when models are trained to
generate text that's rated highly by humans, they may learn to output false statements that human evaluators can't
detect. We aim to circumvent this issue by directly [**eliciting latent knowledge
**](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit) (ELK) inside the activations
detect. We aim to circumvent this issue by directly [eliciting latent knowledge
](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit) (ELK) inside the activations
of a language model.

Specifically, we're building on the **Contrastive Representation Clustering** (CRC) method described in the
Expand All @@ -19,79 +19,104 @@ classification tasks, even though the features are trained without labels.

Our code is based on [PyTorch](http://pytorch.org)
and [Huggingface Transformers](https://huggingface.co/docs/transformers/index). We test the code on Python 3.10 and
3.11.
3.11. An example can be found [here](https://colab.research.google.com/drive/1pzcH55aHVXvfF0967hNixReG--gNT473?usp=sharing).

First install the package with `pip install -e .` in the root directory, or `pip install -e .[dev]` if you'd like to
contribute to the project (see **Development** section below). This should install all the necessary dependencies.
First, create a virtual environment by using e.g. conda:

```
conda create -n ccs python==3.10
conda activate ccs
```

Clone the repository:
```
git clone https://github.com/EleutherAI/ccs.git
```

Next, install the package with `pip install -e .` in the root directory. Use `pip install -e .[dev]` if you'd like to contribute to the project (see **Development** section below). This should install all the necessary dependencies.

To fit reporters for the HuggingFace model `model` and dataset `dataset`, just run:

```bash
elk elicit microsoft/deberta-v2-xxlarge-mnli imdb
ccs elicit microsoft/deberta-v2-xxlarge-mnli imdb
```

This will automatically download the model and dataset, run the model and extract the relevant representations if they
aren't cached on disk, fit reporters on them, and save the reporter checkpoints to the `elk-reporters` folder in your
aren't cached on disk, fit reporters on them, and save the reporter checkpoints to the `ccs-reporters` folder in your
home directory. It will also evaluate the reporter classification performance on a held out test set and save it to a
CSV file in the same folder.

The following will generate a CCS (Contrast Consistent Search) reporter instead of the CRC-based reporter, which is the
default.

```bash
elk elicit microsoft/deberta-v2-xxlarge-mnli imdb --net ccs
ccs elicit microsoft/deberta-v2-xxlarge-mnli imdb --net ccs
```

The following command will evaluate the probe from the run naughty-northcutt on the hidden states extracted from the
model deberta-v2-xxlarge-mnli for the imdb dataset. It will result in an `eval.csv` and `cfg.yaml` file, which are
stored under a subfolder in `elk-reporters/naughty-northcutt/transfer_eval`.
stored under a subfolder in `ccs-reporters/naughty-northcutt/transfer_eval`.

```bash
elk eval naughty-northcutt microsoft/deberta-v2-xxlarge-mnli imdb
ccs eval naughty-northcutt microsoft/deberta-v2-xxlarge-mnli imdb
```

The following runs `elicit` on the Cartesian product of the listed models and datasets, storing it in a special folder
ELK_DIR/sweeps/<memorable_name>. Moreover, `--add_pooled` adds an additional dataset that pools all of the datasets
CCS_DIR/sweeps/<memorable_name>. Moreover, `--add_pooled` adds an additional dataset that pools all of the datasets
together. You can also add a `--visualize` flag to visualize the results of the sweep.

```bash
elk sweep --models gpt2-{medium,large,xl} --datasets imdb amazon_polarity --add_pooled
ccs sweep --models gpt2-{medium,large,xl} --datasets imdb amazon_polarity --add_pooled
```

If you just do `elk plot`, it will plot the results from the most recent sweep.
If you just do `ccs plot`, it will plot the results from the most recent sweep.
If you want to plot a specific sweep, you can do so with:

```bash
elk plot {sweep_name}
ccs plot {sweep_name}
```

## Caching

The hidden states resulting from `elk elicit` are cached as a HuggingFace dataset to avoid having to recompute them
The hidden states resulting from `ccs elicit` are cached as a HuggingFace dataset to avoid having to recompute them
every time we want to train a probe. The cache is stored in the same place as all other HuggingFace datasets, which is
usually `~/.cache/huggingface/datasets`.

## Development
## Contribution Guidelines

Use `pip install pre-commit && pre-commit install` in the root folder before your first commit.
If you work on a new feature / fix or some other code task, make sure to create an issue and assign it to yourself.
Maybe, even share it in the elk channel of Eleuther's Discord with a small note. In this way, others know you are
working on the issue and people won't do the same thing twice 👍 Also others can contact you easily.

### Devcontainer
### Submitting a Pull-Requests
We welcome PRs to our libraries. They're an efficient way to include your fixes or improvements in our next release. Please follow these guidelines:

[
![Open in Remote - Containers](
https://img.shields.io/static/v1?label=Remote%20-%20Containers&message=Open&color=blue&logo=visualstudiocode
)
](
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/EleutherAI/elk
)
- Focus on either functionality changes OR widespread style issues, not both.
- Add tests for new or modified functionality if it makes sense.
- Address a single issue or feature with minimal code changes.
- Include relevant documentation in the repo or on our docs site.

### Run tests
#### "fork-and-pull" Git workflow:

- Fork the repository to your Github account.
- Clone the project to your local machine.
- Create a new branch with a concise, descriptive name.
- Make and commit your changes to our neww branch.
- Follow any repo-specific formatting and testing guidelines (see next section)
- Push the changes to your fork.
- Open a PR in our repository, using the PR template for efficient review.


#### Before commiting
1. Use `pip install pre-commit && pre-commit install` in the root folder before your first commit.

2. Run tests

```bash
pytest
```

### Run type checking
3. Run type checking

We use [pyright](https://github.com/microsoft/pyright), which is built into the VSCode editor. If you'd like to run it
as a standalone tool, it requires a [nodejs installation.](https://nodejs.org/en/download/)
Expand All @@ -100,7 +125,7 @@ as a standalone tool, it requires a [nodejs installation.](https://nodejs.org/en
pyright
```

### Run the linter
4. Run the linter

We use [ruff](https://beta.ruff.rs/docs/). It is installed as a pre-commit hook, so you don't have to run it manually.
If you want to run it manually, you can do so with:
Expand All @@ -109,8 +134,10 @@ If you want to run it manually, you can do so with:
ruff . --fix
```

### Contributing to this repository
### Issues

If you work on a new feature / fix or some other code task, make sure to create an issue and assign it to yourself (
Maybe, even share it in the elk channel of Eleuther's Discord with a small note). In this way, others know you are
working on the issue and people won't do the same thing twice 👍 Also others can contact you easily.
Issues serve three main purposes: reporting library problems, requesting new features, and discussing potential changes before creating a Pull Request (PR). If you encounter a problem, first check if an existing Issue addresses it. If so, add your own reproduction information to that Issue instead of creating a new one. This approach prevents duplicate reports and helps maintainers understand the problem's scope. Additionally, adding a reaction (like a thumbs-up) to an existing Issue signals to maintainers that the problem affects multiple users, which can influence prioritization.

### Discussion and Contact

If you have additional questions you ask them in the elk channel of Eleuther's Discord https://discord.gg/zBGx3azzUn
Empty file added ccs.lock
Empty file.
4 changes: 4 additions & 0 deletions elk/__init__.py → ccs/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
from .extraction import Extract, extract_hiddens
from .training import EigenFitter, EigenFitterConfig
from .training.train import Elicit
from .evaluation import Eval
from .truncated_eigh import truncated_eigh

__all__ = [
"EigenFitter",
"EigenFitterConfig",
"extract_hiddens",
"Extract",
"Elicit",
"Eval",
"truncated_eigh",
]
10 changes: 5 additions & 5 deletions elk/__main__.py → ccs/__main__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
"""Main entry point for `elk`."""
"""Main entry point for `ccs`."""

from dataclasses import dataclass

from simple_parsing import ArgumentParser

from elk.evaluation.evaluate import Eval
from elk.plotting.command import Plot
from elk.training.sweep import Sweep
from elk.training.train import Elicit
from ccs.evaluation.evaluate import Eval
from ccs.plotting.command import Plot
from ccs.training.sweep import Sweep
from ccs.training.train import Elicit


@dataclass
Expand Down
6 changes: 5 additions & 1 deletion elk/debug_logging.py → ccs/debug_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,11 @@ def save_debug_log(datasets: list[DatasetDictWithName], out_dir: Path) -> None:
else:
train_split, val_split = select_train_val_splits(ds)

text_questions = ds[val_split][0]["text_questions"]
if len(ds[val_split]) == 0:
logging.warning(f"Val split '{val_split}' is empty!")
continue

text_questions = ds[val_split][0]["texts"]
template_ids = ds[val_split][0]["variant_ids"]
label = ds[val_split][0]["label"]

Expand Down
File renamed without changes.
57 changes: 45 additions & 12 deletions elk/evaluation/evaluate.py → ccs/evaluation/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
import torch
from simple_parsing.helpers import field

from ..files import elk_reporter_dir
from ..metrics import evaluate_preds
from ..files import ccs_reporter_dir
from ..metrics import evaluate_preds, get_logprobs
from ..run import Run
from ..utils import Color

Expand All @@ -22,7 +22,7 @@ class Eval(Run):
def __post_init__(self):
# Set our output directory before super().execute() does
if not self.out_dir:
root = elk_reporter_dir() / self.source
root = ccs_reporter_dir() / self.source
self.out_dir = root / "transfer" / "+".join(self.data.datasets)

def execute(self, highlight_color: Color = "cyan"):
Expand All @@ -31,38 +31,61 @@ def execute(self, highlight_color: Color = "cyan"):
@torch.inference_mode()
def apply_to_layer(
self, layer: int, devices: list[str], world_size: int
) -> dict[str, pd.DataFrame]:
) -> tuple[dict[str, pd.DataFrame], dict]:
"""Evaluate a single reporter on a single layer."""
device = self.get_device(devices, world_size)
val_output = self.prepare_data(device, layer, "val")

experiment_dir = elk_reporter_dir() / self.source
experiment_dir = ccs_reporter_dir() / self.source

reporter_path = experiment_dir / "reporters" / f"layer_{layer}.pt"
reporter = torch.load(reporter_path, map_location=device)

out_logprobs = defaultdict(dict)
row_bufs = defaultdict(list)
for ds_name, (val_h, val_gt, val_lm_preds) in val_output.items():
for ds_name, val_data in val_output.items():
meta = {"dataset": ds_name, "layer": layer}
if self.save_logprobs:
out_logprobs[ds_name] = dict(
row_ids=val_data.row_ids.cpu(),
variant_ids=val_data.variant_ids,
texts=val_data.texts,
labels=val_data.labels.cpu(),
lm=dict(),
lr=dict(),
reporter=dict(),
)

val_credences = reporter(val_h)
val_credences = reporter(val_data.hiddens)
for mode in ("none", "partial", "full"):
row_bufs["eval"].append(
{
**meta,
"ensembling": mode,
**evaluate_preds(val_gt, val_credences, mode).to_dict(),
**evaluate_preds(
val_data.labels, val_credences, mode
).to_dict(),
}
)
if self.save_logprobs:
out_logprobs[ds_name]["reporter"][mode] = (
get_logprobs(val_credences, mode).detach().cpu()
)

if val_lm_preds is not None:
if val_data.lm_preds is not None:
row_bufs["lm_eval"].append(
{
**meta,
"ensembling": mode,
**evaluate_preds(val_gt, val_lm_preds, mode).to_dict(),
**evaluate_preds(
val_data.labels, val_data.lm_preds, mode
).to_dict(),
}
)
if self.save_logprobs:
out_logprobs[ds_name]["lm"][mode] = get_logprobs(
val_data.lm_preds, mode
).cpu()

lr_dir = experiment_dir / "lr_models"
if not self.skip_supervised and lr_dir.exists():
Expand All @@ -71,15 +94,25 @@ def apply_to_layer(
if not isinstance(lr_models, list): # backward compatibility
lr_models = [lr_models]

if self.save_logprobs:
out_logprobs[ds_name]["lr"][mode] = dict()

for i, model in enumerate(lr_models):
model.eval()
val_credences = model(val_data.hiddens)
if self.save_logprobs:
out_logprobs[ds_name]["lr"][mode][i] = get_logprobs(
val_credences, mode
).cpu()
row_bufs["lr_eval"].append(
{
"ensembling": mode,
"inlp_iter": i,
**meta,
**evaluate_preds(val_gt, model(val_h), mode).to_dict(),
**evaluate_preds(
val_data.labels, val_credences, mode
).to_dict(),
}
)

return {k: pd.DataFrame(v) for k, v in row_bufs.items()}
return {k: pd.DataFrame(v) for k, v in row_bufs.items()}, out_logprobs
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 8e72628

Please sign in to comment.