feat: Add classifier explainability based on token importance #198

Pringled · 2025-02-19T19:21:35Z

This PR adds explainability for classifiers by sorting the tokens of the input based on output layer logits for the predicted class. Couple of open questions/concerns:

I feel like this adds even more functions (on top of the evaluate functions) to inference/model.py Is this the best place to add functions that are shared between inference/model.py and train/classifier.py, or should we have a utils.py or something similar? In the inference module makes sense since that's also installed when installing the train module, but perhaps a different file for functions that are shared between the two makes sense.
Right now, if token_logits do not exist, they are computed when get_most_important_tokens is called. I think this is fine since it's really fast, but should we log this, or is it ok as is?
typing for the model is set to Any since we get some nasty circular import stuff if we want to type it correctly, but maybe the lord of the types knows how to fix this 👀

codecov · 2025-02-19T19:21:48Z

Codecov Report

Attention: Patch coverage is 94.11765% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
model2vec/inference/model.py	93.47%	3 Missing ⚠️
model2vec/train/base.py	75.00%	1 Missing ⚠️

Files with missing lines	Coverage Δ
model2vec/inference/__init__.py	`100.00% <100.00%> (ø)`
model2vec/train/classifier.py	`97.48% <100.00%> (+0.10%)`	⬆️
tests/test_inference.py	`100.00% <100.00%> (ø)`
tests/test_trainable.py	`100.00% <100.00%> (ø)`
model2vec/train/base.py	`98.76% <75.00%> (-1.24%)`	⬇️
model2vec/inference/model.py	`92.66% <93.47%> (+0.28%)`	⬆️

stephantul

I think the implementation contains a bug wrt the tokenization, you retokenize tokens.

Here's what you could do: both the pipeline and the model already contain embeddings for all tokens separately, it's just the embedding matrix itself.

So getting token logits is in both cases just (paraphrased).

model.model_head.predict_logits(model.model.embeddings)

No need to tokenize anything. So this is what I would start with, and then we'll see.

model2vec/inference/model.py

model2vec/train/base.py

model2vec/inference/model.py

model2vec/train/classifier.py

stephantul

Just some nitpicks, I'm curious why you only check the logits for the predicted class, and not all classes. It seems to me that that could or should be part of what it means for a classifier to be explainable.

stephantul · 2025-02-23T10:32:57Z

model2vec/inference/model.py

+    for token_id in unique_ids:
+        # Get the token string and logit
+        token_str = model.tokenizer.id_to_token(token_id)
+        token_logit = model.token_logits_cache.get(token_id)


can this ever be None?

stephantul · 2025-02-23T10:34:12Z

model2vec/inference/model.py

+        if token_logit is None:
+            continue
+        # Get the logit for the predicted label
+        score = float(token_logit[label_idx])


This looks kind of odd to me. Why would the logit of the predicted label, and not the collected logits of the token over all classes, determine the classification. For example, consider a situation in which a token gets high logits for 2 out of many classes. In that case ,it might not be important at all, right?

stephantul · 2025-02-23T10:34:41Z

model2vec/inference/model.py

+        results.append((token_str, score))
+
+    # Sort tokens by descending score
+    results.sort(key=lambda x: x[1], reverse=True)


Could be nicer to not do an in-place sort.

stephantul · 2025-02-23T10:36:52Z

model2vec/inference/model.py

+    unique_ids = set(input_ids[0])
+
+    # Identify tokens that are not yet cached and compute their logits
+    tokens_to_compute = [token_id for token_id in unique_ids if token_id not in model.token_logits_cache]


maybe you can just do this in the loop below? For each token, you just get it from the cache, if you miss, you compute it? Just a small idea

stephantul · 2025-02-23T10:39:08Z

model2vec/train/classifier.py

@@ -246,6 +267,15 @@ def evaluate(

        return report

+    def get_most_important_tokens(self, text: str) -> list[tuple[str, float]]:


I think you can leave these functions out? I don't really see why they exist, except that they strongly couple the modules, where they could be decoupled before. i.e., if this function doesn't exist, there is no need to ever change this code if the explainability module changes, but now there suddenly is.

stephantul · 2025-02-23T10:39:35Z

model2vec/inference/model.py

+        mlp.out_activation_ = original_activation
+        return logits
+
+    def get_most_important_tokens(self, text: str) -> list[tuple[str, float]]:


idem as in the training code.

stephantul · 2025-02-23T10:40:24Z

model2vec/inference/model.py

+        """Predict the logits for the specified token IDs."""
+        # Extract embeddings for the specified token IDs.
+        token_embeddings = self.embeddings[token_ids]
+        mlp = self.head[-1]


just a fair warning that this doesn't work if the head is ever not a pipeline with an mlp as final estimator. Which we do support, in principle.

Pringled added 5 commits February 17, 2025 17:19

Added functionality for getting most important tokens

38fdc33

Added test

fe7f7da

Refactored explainability logic

2897241

Added identity conversion for pipeline

5be9aef

Updated code

e85c627

Pringled requested a review from stephantul February 19, 2025 19:21

Updated docstring

7e1dc76

stephantul reviewed Feb 19, 2025

View reviewed changes

Pringled added 8 commits February 20, 2025 09:16

Resolved feedback

93791af

Switched to cache for token logits

56f6cc4

Fixed bug

c589f78

Updated docs

72105a5

Updated docs

cb94131

Updated docs

198db35

Updated docs

2be448e

Explicitly set zero division behavior

83538bf

Pringled requested a review from stephantul February 20, 2025 16:16

stephantul approved these changes Feb 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add classifier explainability based on token importance #198

feat: Add classifier explainability based on token importance #198

Pringled commented Feb 19, 2025 •

edited

Loading

codecov bot commented Feb 19, 2025 •

edited

Loading

stephantul left a comment

stephantul left a comment

stephantul Feb 23, 2025

stephantul Feb 23, 2025

stephantul Feb 23, 2025

stephantul Feb 23, 2025

stephantul Feb 23, 2025

stephantul Feb 23, 2025

stephantul Feb 23, 2025

		@@ -246,6 +267,15 @@ def evaluate(

		return report

		def get_most_important_tokens(self, text: str) -> list[tuple[str, float]]:

feat: Add classifier explainability based on token importance #198

Are you sure you want to change the base?

feat: Add classifier explainability based on token importance #198

Conversation

Pringled commented Feb 19, 2025 • edited Loading

codecov bot commented Feb 19, 2025 • edited Loading

Codecov Report

stephantul left a comment

Choose a reason for hiding this comment

stephantul left a comment

Choose a reason for hiding this comment

stephantul Feb 23, 2025

Choose a reason for hiding this comment

stephantul Feb 23, 2025

Choose a reason for hiding this comment

stephantul Feb 23, 2025

Choose a reason for hiding this comment

stephantul Feb 23, 2025

Choose a reason for hiding this comment

stephantul Feb 23, 2025

Choose a reason for hiding this comment

stephantul Feb 23, 2025

Choose a reason for hiding this comment

stephantul Feb 23, 2025

Choose a reason for hiding this comment

Pringled commented Feb 19, 2025 •

edited

Loading

codecov bot commented Feb 19, 2025 •

edited

Loading