GPT2 Integrated Gradients - empty input gives false results #190

Victordmz · 2023-06-01T19:33:39Z

🐛 Bug Report

When leaving the input texts empty for GPT2 with integrated gradients, the saliency map seems to be incorrect and giving false results. The goal is to only give <|endoftext|>, the BOS token, as input (and let GPT-2 generate from nothing basically), which can be done by leaving the input empty.

The problem is here:

inseq/inseq/attr/feat/feature_attribution.py

Line 303 in b5d3610

    
           sequences = self.attribution_model.formatter.get_text_sequences(self.attribution_model, batch)

inseq/inseq/models/decoder_only.py

Lines 177 to 182 in b5d3610

    
           @staticmethod 
        
           def get_text_sequences(attribution_model: "DecoderOnlyAttributionModel", batch: DecoderOnlyBatch) -> TextSequences: 
        
               return TextSequences( 
        
                   sources=None, 
        
                   targets=attribution_model.convert_tokens_to_string(batch.input_tokens, as_targets=True), 
        
               )

The call to TextSequences in this method sets skip_special_tokens to True, removing the <|endoftext|> from the input. This also prevents a user from giving <|endoftext|> as the only input (and at the start of the generated text), since it is removed in the input. In that case, when running, there will be an error that the generated text does not begin with the input text.

It can be resolved by temporarily changing the line to:

sequences = TextSequences(
            sources=None,
            targets=self.attribution_model.convert_tokens_to_string(batch.input_tokens, as_targets=True, skip_special_tokens=False),
        )

However, the feature attribution is zero for every <|endoftext|> token in the input and the output. I'm not sure whether or not this is meant to be, the same process with the ecco package gives attribution to this token. Also, the first token (in this case This) gets zero attribution, which is probably not supposed to be the case.

Summary:

Visual glitch when leaving the GPT-2 input empty.
Unable to give <|endoftext|> as input because it is removed when processing.
The temporary fix described above reveals that the feature attribution to <|endoftext|> is zero. This is probably not correct.

🔬 How To Reproduce

Steps to reproduce the behavior:

Run the code sample.

Code sample

import inseq
model = inseq.load_model("gpt2", "integrated_gradients")
model.attribute(
    "",
    "This is a demo sentence."
).show()

Environment

OS: Windows 10
Python version: 3.10.9
Inseq version: 0.5.0.dev0 (pulled from the main branch on 1 June 2023)

Expected behavior

See bug report. This is the integrated gradients result from the ecco package on the same sentence, also using integrated gradients:

I assume this would be correct, however, they leave the baseline default.

The text was updated successfully, but these errors were encountered:

gsarti · 2023-06-02T08:50:32Z

Thank you for the detailed bug report @Victordmz, very appreciated!

I recently stumbled on the problem using empty inputs myself. Indeed, the current implementation does not support unconstrained generation (i.e. with only BOS as prefix) using decoder-only models, and produces the visual bug you showed above otherwise. This is related to the fix needed for the next point.
While the BOS token currently gets removed from the returned outputs (this was done to improve the readability of the matrices), in retrospect, this might have been a design mistake, and we might want to include it in the returned target sequence.
The 0-attribution for the special token <|endoftext|> is a product of the baseline choice for the Integrated Gradient method. At the moment, we use the token associated with UNK in the model config as a baseline for integral approximation

inseq/inseq/models/huggingface_model.py

Lines 271 to 277 in b5d3610

    
           if return_baseline: 
        
               if include_eos_baseline: 
        
                   baseline_ids = torch.ones_like(batch["input_ids"]).long() * self.tokenizer.unk_token_id 
        
               else: 
        
                   baseline_ids_non_eos = batch["input_ids"].ne(self.eos_token_id).long() * self.tokenizer.unk_token_id 
        
                   baseline_ids_eos = batch["input_ids"].eq(self.eos_token_id).long() * self.eos_token_id 
        
                   baseline_ids = baseline_ids_non_eos + baseline_ids_eos

(see #123 for a proposed improvement to enable greater flexibility). In the case of GPT-2, the UNK token corresponds to the EOS token <|endoftext|>, so the attribution is 0 because baseline = target token. If the baseline was different than the token, it would be sufficient to pass the parameter include_eos_baseline=True (which we should soon rename as include_special_tokens_baseline) to model.attribute to obtain non-zero scores. I suspect the Ecco library adopts a 0-vector as baseline following the original approach by Integrated Gradient authors, hence obtaining non-0 attributions for the BOS token.

To summarize, action points here would be:

Remove the BOS-omission logic to enable unconstrained generation attribution
Adjust the baseline creation logic and include_eos_baseline to ensure that all special tokens (and not just EOS) are not included in the explanation by default, with the possibility of include them using include_special_tokens_baseline=True.

Would you be willing to help with any of these? I cannot commit to these improvements in the upcoming month, but can help out if you're willing to give it a shot!

gsarti · 2023-11-13T07:29:50Z

Update: the BOS omission logic was removed, and the current behavior in the main branch matches the one resulting from the temporary fix mentioned above.

This:

import inseq

model = inseq.load_model("gpt2", "integrated_gradients")
model.attribute(
    "",
    "This is a demo sentence."
).show()

is now equivalent to this:

import inseq

model = inseq.load_model("gpt2", "integrated_gradients")
model.attribute(
    "<|endoftext|>",
    "<|endoftext|> This is a demo sentence."
).show()

Closing this, as the choice for alternative baselines beyond the default UNK token (point 3 in the summary) is already document in issue #123.

Victordmz added the bug Something isn't working label Jun 1, 2023

gsarti mentioned this issue Jun 11, 2023

Value Zeroing attribution method #173

Merged

gsarti added the to be investigated Requires further inspection before sorting label Jul 28, 2023

gsarti closed this as completed Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT2 Integrated Gradients - empty input gives false results #190

GPT2 Integrated Gradients - empty input gives false results #190

Victordmz commented Jun 1, 2023 •

edited

Loading

gsarti commented Jun 2, 2023

gsarti commented Nov 13, 2023 •

edited

Loading

GPT2 Integrated Gradients - empty input gives false results #190

GPT2 Integrated Gradients - empty input gives false results #190

Comments

Victordmz commented Jun 1, 2023 • edited Loading

🐛 Bug Report

🔬 How To Reproduce

Code sample

Environment

Expected behavior

gsarti commented Jun 2, 2023

gsarti commented Nov 13, 2023 • edited Loading

Victordmz commented Jun 1, 2023 •

edited

Loading

gsarti commented Nov 13, 2023 •

edited

Loading