Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT2 Integrated Gradients - empty input gives false results #190

Closed
Victordmz opened this issue Jun 1, 2023 · 2 comments
Closed

GPT2 Integrated Gradients - empty input gives false results #190

Victordmz opened this issue Jun 1, 2023 · 2 comments
Labels
bug Something isn't working to be investigated Requires further inspection before sorting

Comments

@Victordmz
Copy link

Victordmz commented Jun 1, 2023

🐛 Bug Report

When leaving the input texts empty for GPT2 with integrated gradients, the saliency map seems to be incorrect and giving false results. The goal is to only give <|endoftext|>, the BOS token, as input (and let GPT-2 generate from nothing basically), which can be done by leaving the input empty.

image

The problem is here:

sequences = self.attribution_model.formatter.get_text_sequences(self.attribution_model, batch)

@staticmethod
def get_text_sequences(attribution_model: "DecoderOnlyAttributionModel", batch: DecoderOnlyBatch) -> TextSequences:
return TextSequences(
sources=None,
targets=attribution_model.convert_tokens_to_string(batch.input_tokens, as_targets=True),
)

The call to TextSequences in this method sets skip_special_tokens to True, removing the <|endoftext|> from the input. This also prevents a user from giving <|endoftext|> as the only input (and at the start of the generated text), since it is removed in the input. In that case, when running, there will be an error that the generated text does not begin with the input text.

It can be resolved by temporarily changing the line to:

sequences = TextSequences(
            sources=None,
            targets=self.attribution_model.convert_tokens_to_string(batch.input_tokens, as_targets=True, skip_special_tokens=False),
        )

image

However, the feature attribution is zero for every <|endoftext|> token in the input and the output. I'm not sure whether or not this is meant to be, the same process with the ecco package gives attribution to this token. Also, the first token (in this case This) gets zero attribution, which is probably not supposed to be the case.

Summary:

  1. Visual glitch when leaving the GPT-2 input empty.
  2. Unable to give <|endoftext|> as input because it is removed when processing.
  3. The temporary fix described above reveals that the feature attribution to <|endoftext|> is zero. This is probably not correct.

🔬 How To Reproduce

Steps to reproduce the behavior:

  1. Run the code sample.

Code sample

import inseq
model = inseq.load_model("gpt2", "integrated_gradients")
model.attribute(
    "",
    "This is a demo sentence."
).show()

Environment

  • OS: Windows 10
  • Python version: 3.10.9
  • Inseq version: 0.5.0.dev0 (pulled from the main branch on 1 June 2023)

Expected behavior

See bug report. This is the integrated gradients result from the ecco package on the same sentence, also using integrated gradients:
image
I assume this would be correct, however, they leave the baseline default.

@Victordmz Victordmz added the bug Something isn't working label Jun 1, 2023
@gsarti
Copy link
Member

gsarti commented Jun 2, 2023

Thank you for the detailed bug report @Victordmz, very appreciated!

  1. I recently stumbled on the problem using empty inputs myself. Indeed, the current implementation does not support unconstrained generation (i.e. with only BOS as prefix) using decoder-only models, and produces the visual bug you showed above otherwise. This is related to the fix needed for the next point.

  2. While the BOS token currently gets removed from the returned outputs (this was done to improve the readability of the matrices), in retrospect, this might have been a design mistake, and we might want to include it in the returned target sequence.

  3. The 0-attribution for the special token <|endoftext|> is a product of the baseline choice for the Integrated Gradient method. At the moment, we use the token associated with UNK in the model config as a baseline for integral approximation

if return_baseline:
if include_eos_baseline:
baseline_ids = torch.ones_like(batch["input_ids"]).long() * self.tokenizer.unk_token_id
else:
baseline_ids_non_eos = batch["input_ids"].ne(self.eos_token_id).long() * self.tokenizer.unk_token_id
baseline_ids_eos = batch["input_ids"].eq(self.eos_token_id).long() * self.eos_token_id
baseline_ids = baseline_ids_non_eos + baseline_ids_eos

(see #123 for a proposed improvement to enable greater flexibility). In the case of GPT-2, the UNK token corresponds to the EOS token <|endoftext|>, so the attribution is 0 because baseline = target token. If the baseline was different than the token, it would be sufficient to pass the parameter include_eos_baseline=True (which we should soon rename as include_special_tokens_baseline) to model.attribute to obtain non-zero scores. I suspect the Ecco library adopts a 0-vector as baseline following the original approach by Integrated Gradient authors, hence obtaining non-0 attributions for the BOS token.

To summarize, action points here would be:

  1. Remove the BOS-omission logic to enable unconstrained generation attribution
  2. Adjust the baseline creation logic and include_eos_baseline to ensure that all special tokens (and not just EOS) are not included in the explanation by default, with the possibility of include them using include_special_tokens_baseline=True.

Would you be willing to help with any of these? I cannot commit to these improvements in the upcoming month, but can help out if you're willing to give it a shot!

@gsarti gsarti added the to be investigated Requires further inspection before sorting label Jul 28, 2023
@gsarti
Copy link
Member

gsarti commented Nov 13, 2023

Update: the BOS omission logic was removed, and the current behavior in the main branch matches the one resulting from the temporary fix mentioned above.

This:

import inseq

model = inseq.load_model("gpt2", "integrated_gradients")
model.attribute(
    "",
    "This is a demo sentence."
).show()

is now equivalent to this:

import inseq

model = inseq.load_model("gpt2", "integrated_gradients")
model.attribute(
    "<|endoftext|>",
    "<|endoftext|> This is a demo sentence."
).show()

Closing this, as the choice for alternative baselines beyond the default UNK token (point 3 in the summary) is already document in issue #123.

@gsarti gsarti closed this as completed Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working to be investigated Requires further inspection before sorting
Projects
None yet
Development

No branches or pull requests

2 participants